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Requirements  for  the  Degree  of  Doctor  of  Philosophy 

BAYESIAN  PREDICTION  IN  MIXED  LINEAR  MODELS 
WITH  APPLICATIONS  IN  SMALL  AREA  ESTIMATION 

BY 

GAURI  SANKAR  DATTA 

August,  1990 

Chairman:   Dr.  Malay  Ghosh 
Major  Department:   Statistics 

Small  area  estimation  is  gaining  increasing  popularity 
in  recent  times.   Government  agencies  in  the  United  States 
and  Canada  have  been  involved  in  estimating  unemployment 
rates,  per  capita  income,  crop  yield,  etc.  simultaneously 
for  many  state  and  local  government  regions.   Typically, 
only  a  few  samples  are  available  from  an  individual  area. 
Consequently,  reliable  estimators  of  "parameters,"  such  as 
the  mean  or  the  variance  for  the  area,  need  to  "borrow 
strength"  from  similar  neighboring  areas  implicitly  or 
explicitly  through  a  model.   Such  estimators  usually  have  a 
smaller  mean  squared  error  of  prediction  than  the  survey 
est  imators . 

In  this  dissertation,  a  general  hierarchical  Bayes 
(HB)  model  is  considered  for  small  area  estimation.   Some 
of  the  widely  used  models  in  small  area  estimation 
including  the  nested  error  regression  model,  random 
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regression  coei^f icients  model,  etc.  considered  by  earlier 
authors  are  seen  to  be  special  cases  o'i    the  proposed 

.  .;.         general  model.   The  predictive  distribution  of  a 

characteristic  of  interest  for  the  unsampled  population 
units  is  found  given  the  observations  on  the  sajnpled  units 
and  is  used  to  draw  inference.   In  particular,  simultaneous 
estimators  of  several  small  area  means  and  variances  are 
developed.   A  mixed  linear  model  with  non informative  prior 

•  •  ■;.        for  regression  coefficients  (or  fixed  effects)  and 

v;.  independent  gamma  priors  (possibly  non  informative)  for  the 

inverse  of  the  variance  components  is  used. 

In  a  special  case  of  this  HB  analysis,  when  the  vector 
of  the  ratios  of  the  variance  components  is  known,  the  HB 
predictor  of  the  vector  of  means  in  finite  population 

;  sampling  is  shown  to  possess  some  frequent ist  optimal 

'  properties  (such  as  best  unbiased  predictor,  best 

equivariant  predictor,  etc.)  basically  under  the  elliptical 
symmetry  assumptions. 
;  Performance  of  this  HB  predictor  is  evaluated  by 

comparing  its  Bayes  risk  with  that  of  subjective  Bayes 
predictor  with  "true"  or  "elicited"  prior  for  the  unknown 
superpopulat ion  parameters.   It  is  shown  that,  under  a 
balanced  one-way  random  effects  model  with  covariates  and 
average  squared  error  loss,  the  difference  in  the  Bayes 

^^i.  risks  of  the  HB  predictor  and  the  "true"  Bayes  predictor  of 
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the  finite  population  mean  or  variance  vector  approaches 
zero  as  the  number  of  small  areas  becomes  increasingly 
large . 

ix 


CHAPTER   ONE 
INTRODUCTION 

1.1   Literature  Review 
Use  of  linear  models  by  as-tronomers  for  predicting  the 
positions  of  celestial  bodies  goes  back  several  centuries. 
Starting  from  these  days,  the  use  of  model-based  inference 
for  prediction  has  received  considerable  attention.   In 
particular,  animal  and  plant  breeders  have  used  such  models 
for  predicting  some  characteristics  of  the  future  progeny. 
Starting  with  the  pioneering  work  of  Henderson  (1953), 
considerable  attention  has  been  devoted  to  this  problem. 
We  refer  to  Gianola  and  Fernando  (1986)  and  Harville  (in 
press)  where  other  references  are  cited.   On  the  other 
hand,  survey  analysts  have  used  the  model -based  approach  in 
finite  population  sampling  with  the  goal  of  predicting 
certain  characteristics  of  the  unsampled  units  in  the 
population  on  the  basis  of  the  observed  sample.   Early  work 
on  this  topic  may  be  found  in  Cochran  (1939,  1946)  where 
the  finite  population  is  viewed  as  a  realization  from  a 
hypothetical  superpopulat ion . 

In  recent  years,  small  area  (domain)  estimation  has 
vg^;  .         grown  into  an  important  topic  in  survey  sampling.   Use  of 


J/7V 


small  area  statistics  was  in  existence  as  early  as  the  11th 
century  in  England  and  the  17th  century  in  Canada  (see 
Brackstone ,  1987).   However,  these  early  small  area 
statistics  were  based  on  data  obtained  by  complete 
enumerat  ion . 

Due  to  the  availability  o-f    limited  resources  and  the 
advent  o"f  sophisticated  statistical  methodologies,  for  the 
past  few  decades,  sample  surveys,  for  most  purposes,  have 
been  widely  used  as  the  means  of  data  collection  in 
contrast  to  complete  enumeration.   The  data  collected  from 
these  surveys  have  been  very  effectively  used  to  provide 
suitable  statistics  at  the  national  and  state  levels  on  a 
regular  basis.   However,  the  use  of  survey  data  in 
sublevels  below  the  state  level  (for  example,  county  or 
other  subdivision)  was  limited  because  the  estimates  for 
these  small  areas  usually  were  based  on  small  samples  and 
produced  unacceptably  large  standard  errors  and 
coefficients  of  variation.   To  improve  the  reliability  of 
the  small  area  statistics,  it  is  necessary  to  have  a  much 
larger  sample  size  for  an  individual  area  than  can  be 
afforded  with  the  limited  resources  available. 
Consequently,  the  use  of  survey  data  (possibly  in 
association  with  the  census  data)  in  producing  reliable 
small  area  statistics  did  not  receive  much  attention. 


During  the  last  few  years,  many  countries,  including 
the  United  States  and  Canada,  have  recognized  the 
importance  of  small  area  estimation.   Recently,  there  is  a 
growing  concern  among  several  governments  with  the  issues 
of  distribution,  equity  and  disparity.   There  may  exist 
subgroups  within  a  given  population  which  are  far  below  the 
average  in  certain  respects,  thereby  necessitating  remedial 
action  on  the  part  of  the  government.   Before  taking  such 
an  action,  there  is  a  need  to  identify  such  subgroups,  and 
accordingly,  the  statistical  data  at  the  relevant  subgroup 
levels  must  be  available.   So,  different  government 
agencies  like  the  Census  Bureau,  Bureau  of  Labor 
Statistics,  Statistics  Canada  and  Central  Bureau  of 
Statistics  of  Norway  have  been  involved  in  obtaining 
estimates  of  population  counts,  adjustment  factors  to 
census  counts,  unemployment  rates,  per  capita  income,  etc. 
for  state  and  local  government  areas. 

In  the  face  of  this  problem,  small  area  estimation 
techniques  have  emerged  that  "borrow  strength"  from  similar 
neighboring  areas  for  estimation  and  prediction  purposes. 
Through  use  of  some  appropriate  model  and  auxiliary 
information  (possibly  obtained  through  complete 
enumeration,  for  example,  census  or  satellite),  small  area 
estimators  of  the  parameters  of  interest  (such  as  the 
finite  population  mean,  variance,  etc.)  usually  improve 


over  the  survey  estimators.   For  a,  good  review  of  the  small 
area  estimation  literature  one  may  refer  to  Ghosh  and  Rao 
(1990). 

The  necessity  of  "borrowing  strength"  has  been 
realized  by  many  statisticians.   Ericksen  (1974)  advocated 
the  use  of  regression  method  for  estimating  population 
changes  of  local  areas.   Fay  and  Herriot  (1979)  proposed  an 
adaptation  of  the  James-Stein  estimator  to  survey  estimates 
of  income  for  small  areas.   Survey  estimates  being  based  on 
a  small  sample  size  (which  is  usually  20  percent  of 
population  of  size  less  than  1000)  usually  have  large 
standard  errors  and  coefficients  of  variation.   To  rectify 
this,  these  authors  first  fit  a  regression  equation  to  the 
census  sample  estimates,  using  as  independent  variables  the 
county  values,  tax  return  data  for  the  year  1969  and  data 
for  housing  from  the  1970  census.   The  estimate  they 
provided  for  each  place  was  a  weighted  average  of  the 
sample  estimate  and  the  regression  estimate.   Battese , 
Harter  and  Fuller  (1988)  considered  prediction  of  areas 
under  corn  and  soybeans  for  12  counties  in  north-central 
Iowa  based  on  1978  June  Enumerative  Survey  and  LANDSAT 
satellite  data.   Battese,  Harter  and  Fuller  (BHF)  used  a 
linear  regression  model  defining  a  relationship  between  the 

ii.&  survey  and  satellite  data  and  used  this  relationship  to 
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t^r*v        obtain  predictors  of  mean  crop  areas  per  segment  in  the 


sampled  counties.   Fuller  and  Harter  (1987)  also  considered 
a  multivariate  extension  of  this  model. 

There  is  a  similar  problem  of  prediction  faced  by  the 
animal  breeders.   For  the  purpose  of  selecting  the  best 
animals  for  future  breeding,  they  need  to  come  up  with  an 
index  for  each  animal  under  consideration.   Henderson 
(1953,  1975)  advocated  the  use  of  best  linear  unbiased 
predictor  (BLUP)  of  certain  linear  combinations  of  fixed 
and  random  effects  using  a  mixed  linear  model.   Harville 
(in  press)  used  a  mixed  linear  model  for  predicting  the 
average  weight  of  single-birth  male  lambs  which  are  progeny 
of  sires  belonging  to  different  population  lines  and  dams 
belonging  to  different  age  categories.   Harville  and  Fenech 
(1985)  considered  this  example  for  estimating  the 
her itabi 1 ity . 

Yet  other  problems  based  on  a  linear  model  come  up  in 
varietal  trials  and  comparative  experiments.   In 
comparative  experiments,  several  treatments  have  to  be 
compared  and  their  effects  or  some  suitable  contrasts  have 
to  be  estimated.   Mu It i centered  clinical  trials  are  good 
examples  of  comparative  experiments  (see  Fleiss,  1986). 
Problems  of  this  type  and  the  ones  mentioned  in  the 
previous  paragraph  can  be  viewed  as  arising  in  an  infinite 
population  framework  as  opposed  to  the  problems  in  finite 
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The  methods  that  have  usually  been  proposed  in  model- 
based  inference  use  either  a  variance  components  approach 
or  an  empirical  Bayes  (EB)  approach,  although  as  pointed 
out  by  Harvi 1 le  ( 1988 ,  in  press),  the  distinction  between 
the  two  is  often  superfluous.   Both  these  procedures  use 
certain  mixed  linear  models  for  prediction  purposes. 
First,  assuming  the  variance  components  to  be  known, 
certain  BLUPs  or  EB  predictors  are  obtained  for  the  unknown 
pareuneters  of  interest.   Then  the  unknown  variance 
components  are  estimated  typically  by  Henderson's  method  of 
fitting  of  constants  or  the  restricted  ma>cimum  likelihood 
(REML)  method.   The  resulting  estimators,  which  can  be 
called  estimated  BLUPs  or  EBLUPs  (see  Harville,  1977),  are 
used  for  final  prediction  purposes. 

Empirical  Bayes  approach  in  small  area  estimation  was 
first  given  in  Fay  and  Herriot  (1979),  and  later  also  used 
by  Ghosh  and  Meeden  (1986),  Ghosh  and  Lahiri  (1987a,  1988) 
ajnong  others.   According  to  this  procedure,  first  a  Bayes 
estimate  of  the  unknown  parameter  of  interest  is  obtained 
by  using  a  normal  prior  or  using  a  linear  Bayes  argument 
(Hartigan,  1969).   The  unknown  parameters  of  the  prior  are 
then  estimated  by  some  classical  methods  like  the  method  of 
moments,  method  of  maximum  likelihood  or  some  combination 
thereof.   The  resulting  estimator  of  the  parameter  of 
interest  is  the  so-called  EB  estimator. 


Although  the  above  approach  of  EBLUP  or  EB  is  usually 
quite  satisfactory  for  point  prediction,  it  is  very 
difficult  to  estimate  the  standard  errors  associated  with 
these  predictors.   This  is  primarily  due  to  the  lack  of 
closed  form  expressions  for  the  mean  squared  errors  (MSEs) 
of  the  EBLUPs  or  the  EB  predictors.   Kackar  and  Harville 
(1984)  suggested  an  approximation  to  the  MSEs  (also 
Harville,  1985,  1988,  in  press;  Harville  and  Jeske ,  1989). 
Prasad  and  Rao  (1990)  proposed  estimates  of  these 
approximate  MSEs  in  three  specific  mixed  linear  models.  All 
these  approximations  rest  heavily  on  the  normality 
assumption.   Recently,  Lahiri  and  Rao  (1990)  considered 
this  problem,  relaxing  the  normality  assumption,  assuming 
some  moment  conditions  without  the  presence  of  auxiliary 
information.   The  work  of  Prasad  and  Rao  (1990)  suggests 
that  their  approximations  work  well  when  the  number  of 
small  areas  is  sufficiently  large.   It  is  not  clear  though 
how  these  approximations  fare  for  a  small  or  even 
moderately  large  number  of  small  areas. 

Ghosh  and  Lahiri  (in  press)  proposed  an  HB  procedure 
as  an  alternative  to  the  EBLUP  or  the  EB  procedure.    In  an 
;.  HB  procedure,  if  one  uses  the  posterior  mean  for  estimating 

the  parameter  of  interest,  then  a  natural  estimate  of  the 
'<■':■'/  standard  error  associated  with  this  estimator  is  its 

WSx^t..'.  posterior  standard  deviation  (s.d.).   The  estimate,  though 
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often  complicated,  can  be  found  exactly  via  numerical 
integration  without  approximation. 

The  model  considered  by  Ghosh  and  Lahiri  (in  press) 
was,  however,  only  a  special  case  of  the  so-called  nested 
error  regression  model,  also  used  by  BHF .   A  similar  model 
was  considered  by  Stroud  (1987),  but  his  general  analysis 
was  performed  only  for  the  balanced  case,  that  is  when  the 
number  of  samples  was  the  same  for  each  stratum. 

Other  models  have  also  been  proposed.   In  a  recent 
article,  Choudhry  and  Rao  (1988)  considered  five  specific 
models  for  small  area  estimation  not  included  in  the 
earlier  work  of  Prasad  and  Rao  (1990).   Recently,  Royal  1 
(1979)  and  Lui  and  Cumberland  (1989)  considered  certain 
cross-class  if icatory  models  for  small  area  estimation.   The 
latter  carried  out  a  Bayesian  analysis  assuming  the 
degeneracy  of  certain  terms  in  an  usual  two-way  linear 
model . 

For  a  Bayesian  analysis  in  the  context  of  animal 
breeding,  one  may  refer  to  Gianola  and  Fernando  (1986). 
However,  they  did  not  consider  the  HB  analysis.   They  used 
subjective  informative  priors  which  are  constructed  from 
the  previous  data  and  experiments.   Also,  they  showed  how 
some  of  the  classical  measures  in  animal  breeding  can  have 
Bayesian  justification.   They  have  also  considered 
non informative  improper  prior. 


t,  .-•■  . 
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One  iinpor1:an"t  special  case  which  arises  in  the  above 
approaches  which  is  also  important  in  the  theory  oi^  least 
squares.   When  the  ratios  of  variance  components  are  known, 
the  predictors  (least  squares,  empirical  Bayes  or 
hierarchical  Bayes)  are  BLUPs  (Henderson,  1963).   For 
related  BLUP  results  for  predicting  scalars  in  finite 
population  sampling  one  may  refer  to  Royal  1  (1979),  Lui  and 
Cumberland  (1989),  Prasad  and  Rao  (1990)  and  several 
others.   Harville  (1985,  1988,  in  press)  has  pointed  out 
the  BLUP  properties  of  Bayesian  scalars  in  general  mixed 
linear  models  (see  also  Harville,  1976).   Ghosh  and  Lah i r i 
(in  press)  have  extended  Henderson  and  others  scalar  BLUP 
notion  to  show  the  Bayesian  predictor  of  the  vector  of 
finite  population  mean  is  BLUP.  . 

To  conclude  this  discussion,  we  will  briefly  mention 
another  problem.   So  far,  we  have  considered  only  the 
problem  of  estimating  the  mean  in  finite  population 
sampling.   Another  important  problem  in  finite  population 
sampling  is  estimating  the  finite  population  variance. 
Ericson  (1969)  found  the  Bayes  estimator  of  finite 
population  variance  under  a  normal  theory  set  up. 
Empirical  Bayes  estimation  of  finite  population  variance  in 
small  area  estimation  was  considered  by  Ghosh  and  Lahiri 
(1987b)  and  Lahiri  and  Tiwari  (in  press)  without  the 
presence  of  auxiliary  information. 
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1.2   The  Subjecl;  o"f  This  Dissertation 
In  this  dissertation,  we  present  a  unified  Bayesian 
prediction  theory  "for  linear  models  in  small  area 
estimation  in  the  context  of  finite  population  sampling.   A 
general  Bayesian  model  is  presented  which  can  be  regarded 
as  an  extension  of  the  HB  ideas  of  Lindley  and  Smith  (1972) 
to  prediction.   This  general  model  can  also  be  applied  in 
infinite  population  situations,  for  example,  in  animal 
breeding  and  in  other  applications  where  a  mixed  linear 
model  is  used . 

In  Chapter  Two,  we  introduce  a  general  HB  model  and 
use  this  model  for  simultaneous  estimation  of  several  small 
area  means  in  finite  population  sampling.   Some  of  the 
widely  used  models  in  small  area  estimation  including  the 
nested  error  regression  model  (Battese  et  al . ,  1988;  Prasad 
and  Rao,  1990;  Stroud,  1987;  Ghosh  and  Lahiri,  in  press), 
the  random  regression  coefficients  model  (Dempster  et  al . , 
1981;  Prasad  and  Rao,  1990),  the  cross-classif icatory 
models  (Royal  1 ,  1979;  Lu i  and  Cumberland,  1989)  and  multi- 
stage sampling  models  (Ghosh  and  Lahiri,  1988;  Malec  and 
Sedransk,  1985;  Scott  and  Smith,  1969)  can  be  regarded  as 
special  cases  of  our  model.   The  posterior  distribution  as 
well  as  the  resulting  posterior  means  and  variances  of  the 
unobserved  units  in  the  population  given  the  sample  units 
are  provided  in  this  chapter.   Also  for  an  infinite 
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population,  given  the  data,  the  conditional  distribution 
and  the  conditional  mean  and  variance  of  the  vector  of 
effects  are  provided.   These  two  analyses  are  applied  to 
two  real  data  sets.    It  is  worthwhile  to  mention  that 
Bayesian  analysis  in  linear  models  was  initiated  by  Hill 
(1965).   See  also  Hill  (1977,  1980).   For  a  good  exposition 
on  HB  analysis  see  Berger  (1985) • 

In  Chapter  Three,  a  special  case  of  HB  models 
discussed  in  the  previous  chapter  is  considered.   Based  on 
this  model,  which  assumes  known  ratios  of  variance 
components,  certain  optimal  properties  of  the  HB  predictors 
proposed  in  this  chapter  are  proved.   Although,  developed 
within  a  Bayesian  framework,  these  results  should  be  of 
appeal  also  to  frequent ists .   The  BLUP  notion  for  real 
valued  parameters  is  extended  to  vector  valued  parameters, 
and  it  is  shown  that  the  Bayesian  predictors  derived  in 
this  chapter  are  indeed  BLUPs .   From  this,  as  a  special 
case,  it  follows  that  the  Bayesian  predictors  of  the  finite 
population  mean  vector  and  other  linear  parameters  are 
BLUPs  as  well.   Our  BLUP  result  for  the  finite  population 
mean  vector  unifies  a  number  of  similar  results  derived 
under  specific  models  (e.g..  Royal  1 ,  1979;  Ghosh  and 
Lahiri,  in  press;  Lu i  and  Cumberland,  1989;  Prasad  and  Rao, 
1990).   Like  other  related  articles,  our  BLUP  results  do 
£^.  not  require  any  normality  assumption  of  the  model.   For  a 
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suitable  subclass  of  el  1 i pt i cal ly  symmetric  distributions, 
including  but  not  limited  to  the  normal,  the  HB  predictors 
are  shown  to  be  best  unbiased;  that  is,  they  have  the 
smallest  var iance-covar iance  matrix  within  the  class  of  all 
unbiased  predictors.   Also,  following  Hwang  (1985),  we  have 
been  able  to  show  that  the  BLUPs  also  "universally"  (or 
"stochastically")  dominate  the  linear  unbiased  predictors 
for  elliptically  symmetric  distributions.   The  notion  of 
"universal"  and  "stochastic"  domination  will  be  made 
precise  in  Chapter  Three.   Also,  it  is  established  that 
under  a  suitable  group  of  transformations,  the  HB 
predictors  are  best  within  the  class  of  all  equivariant 
predictors  for  elliptically  symmetric  distributions.   Jeske 
and  Harville  (1987)  have  shown  that  the  scalar  BLUPs  are 
best  equivariant  within  the  class  of  all  linear  equivariant 
predictors  without  any  distributional  assumption.   However, 
to  our  knowledge,  the  equivariance  results  for  vector 
valued  predictors  have  not  been  addressed  before  in  this 
context  in  their  full  generality. 

In  Chapter  Four,  we  have  established  some  asymptotic 
results  regarding  the  Bayes  risk  performance  of  certain  HB 
predictors  of  the  finite  population  mean  vector.   We  have 
considered  two  specific  models,  namely,  the  random 
regression  coefficients  model  and  the  nested  error 
regression  model,  introduced  in  Chapter  Two.   We  have  shown 


13 


that  under  average  squared  error  loss  the  Bayes  risk 
dii^i^erence  between  the  HB  predictors  and  the  subjective 
Bayes  predictors  for  a  "true"  prior  goes  to  zero  as  the 
number  of  small  areas  goes  to  infinity.   This  shows  our  HB 
predictors  are  "asymptotically  optimal"  (A.O.)  in  the  sense 
of  Robbins  (1955).   The  A.O.  property  of  certain  EB 
predictors  arising  naturally  in  the  context  of  finite 
population  sampling  was  proved  in  Ghosh  and  Meeden  (1986), 
Ghosh  and  Lahiri  (1987a)  and  Ghosh,  Lahiri  and  Tiwari 
(1989). 

Chapter  Five  is  devoted  to  the  simultaneous  estimation 
of  several  strata  variances.   We  have  considered  the 
special  cases  of  the  nested  error  regression  model  as 
considered  by  Ghosh  and  Lahiri  (in  press)  and  Stroud  (1987) 
in  detail.   As  in  Chapter  Four,  we  have  proved  the  A.O. 
property  of  these  predictors.   Ghosh  and  Lahiri  (1987b)  and 
Lahiri  and  Tiwari  (in  press)  have  proved  the  A.O.  property 
of  certain  EB  predictors  of  finite  population  variances. 

We  reemphasize  that  the  present  dissertation  provides 
a  unified  Bayesian  analysis  both  in  the  finite  and  infinite 
population  framework.   For  finite  population,  we  unify  a 
number  of  models  considered  earlier  by  different  authors. 
From  the  analysis  of  two  data  sets  we  undertook  in  Chapter 
Two,  it  is  clear  that  the  proposed  procedure  is  a  viable 
alternative  to  the  available  EBLUP  or  EB  procedures.   Also, 
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to  our  knowledge,  estimates  oi^  MSEs  or  good  approximations 
thereof  are  not  available  except  for  a  few  specific  models. 
The  Bayesian  procedures  of  this  dissertation,  on  the  other 
hand,  can  serve  as  a  general  recipe  to  handle  a  greater 
variety  of  problems.   Also,  the  inferential  methods  of  the 
following  chapters  are  implementable  for  data  analysis, 
especially  in  these  days  of  sophisticated  computing 
f ac  i 1  it  ies . 


;-■■' 


CHAPTER  TWO 
BAYESIAN  PREDICTION  OF  MEANS  IN  LINEAR  MODELS: 

GENERAL  CASE 

2  . 1   Introduct  ion 

In  "this  chapter  we  will  consider  two  similar  but, 
dii^i'erent  problems  simultaneously.   One  problem  reisers  to 
the  small  area  estimation  problem  in  the  context  oi^  finite 
population  sampling  and  the  other  problem  deals  with  the 
prediction  problem  in  comparative  experiments  in  the 
context  of    ANOVA ,  ANOCOVA  or  linear  regression  in  infinite 
population  situation.   In  both  these  cases,  a  mixed  linear 
model  is  used.   In  the  first  case,  we  are  interested  in 
predicting  some  finite  population  characteristics  (e.g., 
finite  population  totals  or  means)  whereas  in  the  second 
case  we  are  interested  in  predicting  linear  functions  of 
fixed  and  random  effects. 

In  the  finite  population  seunpling  set  up,  we  assume 
that  there  are  m  strata,  the  i    stratum  U-  containing  a 
finite  number  of  units  N-  with  units  labelled  U-^,...,  ^\\j    • 
Let  Y-  •  denote  some  characteristic  of  interest  associated 
with  the  j    unit  in  the  i    stratum  (j  =  1,...,  N-; 
i  =  1,...,  m) .   We  assume  that  the  Y.  •  are  observables  in  the 
sense  that  we  can  actually  learn  the  exact  value  of  any 
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of  the  Y.  •  for  some  finite  cost.   We  are  interested  in 

predicting  some  linear  combinations  of  these  observables 

(like  the  finite  population  total  or  mean  for  each  small 

area  or  domain)  using  a  quadratic  loss.   For  notational 

convenience,  we  will  denote  a  sample  of  size  n.  from  the 

i"^*^  stratum  by  Y-.,  Y-^,...,    Y-   . 
•^   1 1 '   1 2  '   '   in  • 

On  the  other  hand  in  the  infinite  population  set  up  we 
are  interested  in  predicting  linear  combinations  (in 
particular  contrasts)  of  fixed  and  random  effects.   Note 
that  in  this  set  up  these  quantities  are  not  observables. 
For  this  problem  too,  we  use  a  quadratic  loss  function. 

We  will  use  the  word  predictands  to  refer  to  the 
quantities  we  want  to  predict  in  both  the  problems.   The 
analysis  will  be  done  in  two  stages;  in  the  first  stage  we 
assume  the  ratios  of  the  variance  components  are  known 
whereas  in  the  second  stage  we  consider  the  more  general 
situation  where  all  the  variance  components  are  unknown. 

In  Section  2.2  a  general  HB  model  will  be  described 
and  a  number  of  interesting  examples  arising  in  finite 
population  sampling  or  in  the  infinite  population  set  up 
will  be  considered.   Some  of  the  existing  models  used  in 
the  context  of  finite  population  sampling  are  shown  to  be 
special  cases  of  the  general  model  to  be  proposed  in  this 
section . 
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Realizing  the  importance  of  the  problem,  the  most 
general  situation  where  all  the  variance  components  are 
unknown  will  be  considered  in  this  chapter,  whereas  the 
known  ratios  of  variance  components  will  be  considered  in 
the  next  chapter.   This  general  situation  is  considered  in 
Section  2.3.   A  general  mixed  linear  model  is  considered 
and  some  prior  distribution  is  assigned  to  all  unknown 
parameters  which  consist  of  the  vector  of  fixed  effects  and 
the  variance  components.   In  the  first  part  of  Section  2.3, 
for  the  model  introduced  in  Section  2.2,  we  have  found  the 
posterior  (predictive)  distribution  of  the  characteristic 
of  interest  of  the  nonsampled  population  units  given  the 
values  of  that  characteristic  for  the  sample  units  in 
finite  population  sampling.   Also  the  posterior  mean  vector 
and  posterior  var iance-covar iance  matrix  corresponding  to 
the  characteristic  vector  of  nonsampled  units  are  obtained 
from  this  predictive  distribution.   In  particular,  the 
posterior  means  and  the  variances  of  the  finite  population 
means  for  all  the  small  areas  are  obtained.   In  the  second 
half  of  Section  2.3,  we  have  obtained  the  posterior 
distribution  of  the  vector  of  fixed  and  random  effects  for 
the  model  introduced  in  Section  2.2.   In  particular, 
expressions  for  the  posterior  means  and  variances  of 
certain  predictands  are  developed. 
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In  Section  2.4,  we  have  applied  the  results  of  Section 
2.3  to  some  actual  data  sets.   First,  we  shall  consider  the 
corn  and  soybeans  data  which  appeared  in  Battese ,  Harter 
and  Fuller  (1988).   Using  the  HB  analysis  developed  in 
Section  2.3,  we  have  derived  the  posterior  means  and 
posterior  standard  deviations  for  the  12  small  area 
(county)  means.   The  second  data  set  containing  the  weights 
of  62  single-birth  lambs  appeared  in  Harville  (in  press) . 
This  is  analyzed  by  HB  methods  for  an  infinite  population 
set  up  developed  in  the  second  half  of  Section  2.3. 

Finally,  in  Section  2.5  an  HB  analysis  of  the  model 
considered  by  Carter  and  Rolph  (1974)  and  subsequently  by 
Fay  and  Herriot  (1979)  to  estimate  the  per  capita  income  of 
small  places  is  considered.   In  this  situation  unit  level 
observations  are  not  available  and  we  are  interested  in 
predicting  the  finite  population  mean  for  each  small  area. 
Here  the  sampling  variances  are  different  and  are  assumed 
to  be  known;  also,  we  put  a  uniform  prior  on  the  regression 
coefficients  and  a  gamma  prior  (proper  or  improper)  on  the 
inverse  of  the  prior  variance. 

2 . 2    Description  of  the  Hierarchical  Bayes  Model 

with  Examples 

Consider  the  following  Bayesian  model: 

T  T* 

(A)  conditional  on  b  =  (b-.,...,  bp)   ,  y  =  (v..,...,  Vq)   ,  A 


T 
=  (A-.,...,  A^)   and  r,  let 
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Y  ~  N(Xb  +  Zv,  r~^*) ; 

(B)  conditional  on  b,  A  and  r,  let 

V  ~  n(0,  r~lD(A)j; 

(C)  B ,  R  and  A  have  a  certain  joint  prior  distribution 
proper  or  improper. 

Stages  (A)  and  (B)  o-f  the  model  can  be  identified  as  a 
general  mixed  linear  model.   To  see  this,  write 

Y  =  Xb  +  Zv  +  e,  (2.2.1) 

where  b  is  the  vector  of  fixed  effects,  e  and  v  are 
mutually  independent  with  e  ~  NfO,  r~  '9]    and  v  ~ 
NIO,  r~  D(A)j;  X  and  Z  are  known  design  matrices,  $  is  a 
known  positive  definite  (p.d.)  matrix,  while  D(A)  is  a  p.d. 
matrix  which  is  structurally  known  except  possibly  for  some 
unknown  A.   In  the  examples  to  follow,  A  involves  the 
ratios  of  variance  components. 

In  the  context  of  small  area  estimation,  partition 
Y(NjXl)  ,  X(NrpXp),  Z(NrpXq)  and  e(NrpXl)  with  conformity,  and 
rewrite  the  model  given  in  (2.2.1)  as 

W  (1)        (1)        (!) 

1,(2))  =  (,(2)>  -  (,(2,>  -  [j4  (^-•^) 


In  the  above  paragraph,  Y    (n^pxl)  corresponds  to  the 
vector  of  sajnpled  units  from  m  small  areas  or  strata,  while 
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Y    (  (Nrp-rirp)  xl  j  corresponds  to  the  vector  o"f  unsajnpled 

(1)T         (1)T 
units.   We  will  further  partition  Y       into  Y      = 


th 


(Yi     »•••»  Ym    )»  where  the  n. -component  vector  Y-    = 

(Y.^,...,  Y.   j   is  the  vector  of  sampled  units  from  the  i 

(2)T 
small  area.   Similarly,  Y      can  be  partitioned  into 

/  (2)T       (2)T\  C2") 

fYj     ,...,  Ym    j,  where  the  (N  .-n  ^) -component  vector  Yj    = 

(Y-     .1?  -j  '"^i  n  )   ^®  the  vector  of  unsampled  units  for 
the  i    small  area.   One  of  our  primary  objectives  in  small 


area  estimation  is  to  estimate  the  finite  population  mean 

vector  7  =  (j-.,...,    j^)       where  7.  =  EY-./N,,  i  =  1,...,  m. 

j  =  l  -^^ 

More  generally,  we  may  be  interested  in  predicting  the 
vector  of  linear  combinations  AY     +  CY     =  ^(Y    »  Y    ) 
(say)  for  known  matrices  A(uxnrp)  and  C(  ux  (Nr^-nrr^)  ) .   For 
this  purpose  it  suffices  to  find  the  predictive 
distribution  of  Y     given  y    .   In  the  next  section  this 
will  be  accomplished  by  using  model-based  approach  in 
survey  sampling. 

Before  we  consider  the  other  problem  in  the  infinite 
population  set  up,  we  identify  some  of  the  existing  models 
introduced  for  small  area  estimation  by  several  authors  as 
special  cases  of  (2.2.2),   In  what  follows,  we  shall  use 
the  notation  ly  for  a  uxu  identity  matrix,  1^  for  a  u- 
component  column  vector  with  each  element  equals  to  1 ,  Ju  v 
for  the  uxv  matrix  lulv  and  J^  for  the  uxu  matrix  Ju.u- 
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(T       TN 
Bi  ,...,  B,  I  ,  and  let; 

k 


A.  denote  the  matrix 


i  =  l 


~  1 


Ai  ...  Q 


0  ...  A, 


First,  consider  the  nested  error  regression  model 


''^■ 


Yjj  =  x!jb  +  V.  +  e.j  (j  =1,...,  N.;  i  =  1,...,  m) 


(2.2.3) 


The  model  was  considered  by  Battese ,  Harter  and  Fuller 
(1988)  .   They  assumed  the  v.  and  e.  •  to  be  mutually 
independent  with  v.  iid  N(0,  (Ar)"  J,  and  e-  •  i id 


N(0,  r  ^).   In  this  case  X^  ^  =   col 

l<i<m 


Xf^^=   col 
l<i<m 


col  X ■    ■ 

ni+l<j<N.~iJ 


(1) 


m 


col   X.  . 
l<j<n.  iJ 

(2) 


i=l>"i'  « 


m 

i=l-Ni-ni 


*=I|^  ,  t  =  l,  A  =  A  and  P(A)  =  A  ^Im-        In  the  further 
special  case  of  Ghosh  and  Lahiri  (in  press)  ,  x-  •  =  x-  for 
every  j  =  1,...,  N-,  i  =  1,...,  m.   Note  that  A  = 
V(e  .  .) /V(v . )  ,  a  ratio  of  the  variance  components. 

The  random  regression  coefficients  model  of  Dempster, 
Rubin  and  Tsutakawa  (1981)  (also  Prasad  and  Rao,  1990)  is 
also  a  special  case  of  ours.    In  this  set  up,  X    ,  X    ,     ^ 
and  D(A)  are  the  same  as  in  the  nested  error  regression 


model ,  but  Z 


(1) 


m 

© 
i  =  l 


col       X  •    • 
l<j<n.~iJ 


and    Z 


(2) 


m 

© 

i  =  l 


col 


ni+l<j<N.-iJ 
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Some  of  the  models  of    Choudhry  and  Rao  (1988)  can  also 
be  treated  as  special  cases  of  ours.   For  exeimple,  one  of 
their  models  is  given  by 


Y.j  =  bx.j  +  V.  +  etjx'/j  (j  =  l,...,  Nj;  i  =  l,...,  m)   (2.2.4) 

with  vj  iid  N^O ,  (rA)~^)  and  e*  .  i id  N(0,  r~^)  .   Here  p  =  1, 
b-.  =  b ,  X     and  X     are  the  same  as  in  the  nested  error 
regression  model  with  the  vector  x.  •  replaced  by  scalar 
X.  .;  Z    )  Z     and  D(A)  are  the  same  as  in  the  nested 
error  regression  model,  but  $  =  Diag(xH -.,... ,  x^ki  ,...,  Xfni5--5 
X  ».  ).   Another  model  considered  by  these  authors  is 

1  /2 

similar  to  the  one  given  in  (2.2.4)  with  x- ■  replacing  x- • 
as  multipliers  of  e'^-.   Yet  another  model  considered  by 
Choudhry  and  Rao  (1988)  is 


Y.  .  =  bx.  .  +  v.xV]  +  e^jxV]  (j  =  1,...,  N.;  i  =  1,...,  m)  , 

(2.2.5) 


with  the  V.  and  e'J' •  having  the  same  distribution  as  in 
the  previous  model.   Here  *  =  Diag(x2 1  ,... ,  X-,»j  ,...,  ^  h,..., 

^   ^(1)     m   (1)   ^(2)     m   (2)    .  .    (1) 
'^mNn,)'  ?     =  il^yi  ''  Z'    =  .e^a-  ^  with  u\    ^    = 

(    1/2  1/2    nT  (2)  /    1/2  1/2    nT      ^(1)  .    ^(2) 

1  ^       '     1  i' 

the    same    as    in     (2.2.4). 

It  is  possible  also  to  include  certain  cross- 
classification  models  as  special  cases  of  our  general 
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linear  model.   For  example,  suppose  ■there  are  m  small  areas 

labelled  1,...,  m.   Within  each  small  area,  units  are 

further  classified  into  c  subgroups  (socioeconomic  class, 

age,  etc.)  labelled  1,...,  c.   The  cell  sizes  N.  . 

(i  =  1,...,  m;  j  =  1,...,  c)  assumed  to  be  known.   Let  Y-  •. 

(k  =  1,...,  N-  •)  denote  the  measurement  on  the  k    individual 

in  the  (i,j)    cell.   Conditional  on  b,  r  and  A,  suppose 

Yijk  =  ^Tj^  +  -i  +  '^j  +  Tij  +  e.ji^  (2.2.6) 

(k  =  1,...,  N.j;  i  =  1,...,  m;  j  =  1,...,  c)  ,  with  r^,  r) y     7.^ 
and  e.  -i   mutually  independent  with  e-  ..   iid  N(0,  r~  ),  7-  • 
iid  n(0,  (Agr)"^),  rj.    iid  n(o  ,  (A2r)~^j  and  r-  iid 
n(o,  (A^r)"^).   In  this  a 


yd) 


;ase 


^(2) 


col  (   col  (       col   Y.  .,  )  ), 
l<i<myi<j<cVl<k<n.  .  ^J'^/J 

col     (     col     f  col  Y.  ..  )), 

l<i<m\^l<j<cVn^  .+  l<k<N.  .    ^J^/y 

X^^^    =       C9I     i     col     (in.  .xT.)l, 

X^^-^    =       col     i     col     fix,         „      xT.U, 
l<i<m\l<j<cV-Nij-n.j-ij;j' 


7(1)       /^(i)  7(1)  ^(ih     .  ^_    7(1)  _    S  In 

4;  =    [4-1         4<2         t'Q       )    where      Z^  =      ®  In- 

c 

'•      j=i  '-^ 
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Z^^)  =   col  (i  ln..|,  Z^l)  =   ®  (S  in..] 
-2      l<i<mlj=l   ^JJ    "^      i=llj  =  l   iJ 


and  Z     is  a  matrix  similar  to  Z     with  (N-  .-n  •  •) 
replacing  n-  •  in  dei^ining  the  dimensions  oi^  the  vectors. 
Also,  v'^  =  (t^,...,    Tjn,    r?!,...,  r/c »  Tn,---,  7mc)  »  t  =  3,  A^  = 
(A^,  A2,  A3)  and  D(A)  =  Diag^A^  !„, ,  A2  Ic  >  -^3  Imc)  •   Special 
cases  of  this  model  have  been  considered  by  several  others. 
Lui  and  Cumberland  (1989)  considered  a  model  where  r-  and 
7.  •  are  degenerate  at  zeroes.   Also,  they  assumed  the 
variance  ratio  A2  to  be  known  in  deriving  their  estimators, 
and  did  not  address  the  issue  of  unknown  X^    appropriately. 

Next  we  show  that  the  two  stage  sampling  model  with 
covariates  and  m  strata  is  a  special  case  of  our  general 
linear  model.   Suppose  that  the  i    stratum  contains  L. 
primary  units.   Suppose  also  that  the  j    primary  unit 
within  the  i    stratum  contains  N-  •  subunits.   Let  Y-  •, 
denote  the  value  of  the  characteristic  of  interest  for  tlie 
k    subunit  within  the  j    primary  unit  from  the  i 
stratum  (k  =  1,...,  N..;  j  =  1,...,  L.;  i  =  1,...,  m).   From  the 
i    stratum,  a  sample  of  (. .    primary  units  is  taken.   For 
the  j    selected  primary  unit  within  the  i    stratum,  a 
sample  of  n-  .  subunits  are  selected.   For  notational 
convenience,  denote,  without  loss  of  generality,  the  sample 
values  by  Y^^j^  (k  =  1,...,  n-j;  j  =  1,...,  E^;     i  =  1,...,  m)  . 
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Assume  conditional  on  b,  r  and  A 


1.1k    -11-    ^1     '1. 


+  e.  . 


ijk 


(k=l,...,  N..;  j  =  1,...,  L^;  i  =  1,...,  m), 


(2.2.7) 


where  ^.,  77-  •  and  e-  .,   are  mutually  independent  with  ^.  i  id 
n(0,  (A^r)-l),  r,jj  i  id  n(o  ,  (X^r)''^),    e^.^    i  id  N(0,  r-^)  . 
Let 


y(')=   col 

l<i<m 


l<i<m 


col  \       col   (Y.  .,  )  I 


col  \  col    (Y.  .,  )  I 


u.j  =  1  +  n..I^.^^^y     i  =  1,...,  m, 


V 


and   W2 


(sT  w|  wj)  ,  s  =  ^  C9l^(^.) 

col  I   col   (n .  .)  I 
l<i<mVl<J<«i   'J  / 

col  I     col    (n  ■   •)  I . 
l<i<m\£.  +  l<j<L.^  'J  / 


Also,  let  e     be  defined  similarly  as  Y    ,  i  =  1,  2 
Then  (2.2.7)  can  be  written  as  (2.2.2)  with 


X^^^  =   col  I   col  fin.  .xT  U, 
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X^^-^  =   col  I      col   fir.  .xT.U,  where 
l<i<tn\l<j<LA~  iJ~iJ>'J 


^ij  =  Nij  -  u. .  +  1; 


Z(^)  =  (S^^)  W(^>),  S^^)  =  .Sln.  ,  n,   =  ^  n .  . , 

V  /  1=1    1  .      ^  •      j—^      '^J 

W^l>  =  (W(^)  W^^>)  where  W,^^)  =  .  i  ( .  eS 


i  =  lVJ  =  l~"iJ;' 


(1 )  jn  jn  m 

!!2    =  n^QL.-£.' 


y^''    =    n„0,   „  ,  n^  =  .I^"i.'  L.  =  .ELj,  «.  =  E^i? 


i  =  l  i=l  i  =  l 


j(2)  ^  (3(2)  ^(2)^_  3(2)  ^  ^^^j^^^ 

I^i  =  EN;  .-n.   and  W^^^  =  ®  (  0  Ir.  . 
i    jtl  ^J   ^-      ~       i  =  lVJ  =  l~  ^J; 


.,¥■ 


Here  t  =  2,  A  =  (A-^,  A2)'^,  *  =  I^  .  D(A)  =  Diag(A-llm,  ^2^Il.) 

m     1 
with  Nrp  =  E   E  N;  ••   The  ideas  can  be  extended  directly 

i=l  j=l  ^ 
to  multistage  sampling  with  more  complicated  notations.   We 

may  mention  here  that  Bayesian  analysis  for  two  stage 

sampling  was  introduced  first  by  Scott  and  Smith  (1969)  in 

a  much  simpler  framework.   A  multistage  analog  of  their 

work  was  provided  by  Malec  and  Sedransk  (1985).   Ghosh  and 

Lahiri  (1988)  considered  empirical  Bayes  estimation  in 

multistage  sampling. 
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Now  we  will  consider  "the  ini^inite  population  set  up. 
In  this  context,  we  will  use  the  model  given  by  (A) ,  (B) 
and  (C)  with  the  mixed  linear  model  representation  given  by 
(2.2.1).   Here  we  will  assume  the  data  vector  Y  is  n^^xl 
and  the  associated  design  matrices  X  and  Z  are  n^pxp  and 
nrpXq  respectively.   Without  loss  of    generality,  we  also 
assume  that  rank(X)  =  p.   Our  objective  is  to  predict 
§b  +  Ty  =  CCb?  y)  (s^y)  on  the  basis  of  Y  where  S(uxp)  and 
T(uxq)  are  known  matrices.   Following  the  model-based 


inference,  it  suffices  to  find  the  posterior  (conditional) 

'bN 

conditional  distribution  is  provided  in  the  next  section. 


listribution  of  (~j  =  W  (say)  given  Y  =  y.   This 


We  will  conclude  this  section  discussing  a  few  specific 

models  in  the  context  of  comparative  trials  and  animal 

breeding  which  are  special  cases  of  the  general  model 

proposed  in  (2.2.1). 

First  consider  mult i centered  clinical  trial  which  is 

conducted  in  c  participating  clinics  to  compare  two 

treatments,  one  already  existing  in  the  market  and  the 

other  newly  developed.   Suppose  there  are  n- .  subjects 

receiving  the  i    treatment  in  the  j    clinic.   Some  of  the 

n.  .  could  be  zero.   We  are  interested  in  estimating  the 

treatments  difference.   For  this  example,  we  will  use  a 

mixed  linear  model  for  Y- •.  ,  some  measured  response 

1  J  K 

corresponding  to  the  k    subject  receiving  the  i 
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treatment  in  the  j    participating  clinic.   We  consider  the 
model : 


Yijk  =  /^i  +  Cj  +  Tij  +  e^j^  (2.2.8) 

(k  =  1,  2,...,  njj,  j  =  1,...,  c,  i  =  1,  2)  where  c.,  7..  and 

e-  ..  mutually  independent  with  subject  e"fi^ects  e-  •,  are  i  id 
1 J  K  1 J  K 

N(0,  r~  )  ,  treatment-clinic  interaction  7-  ■  i  id 
NfO,  (A2r)~  j  and  clinic  effects  c-  i  id  N(0 ,  (^-.r)"  j;  /i  •  is 
the  effect  due  to  the  i    treatment.   Now  we  will  write 
down  the  Y,  X,  Z,  b,  v  and  e  for  (2.2.8).   For  ease  of 
presentation,  assume  n-  •  >  0  for  al  1  i  =  1  ,  2 ,  j  =  1,...,  c. 

Then  writing  Y  =  (Vm,...,  Y^j^^^,...,  Y^^^,...,  "^lan^^^-^ 

T 
^2cl'-'  ^2cn2c)  '  ~  "  '^''l'  ^2)^'  Y  =  (c^,...,  c^,    Tn,--, 


'Ic'---'  '''2c)  '  ^  =  .®  In.  »  ?!  =   ®  1   i,    Zo    =      ®    ®  In.  .»  ? 


c        7   _   2    c 

j=l  "-J    ^    i=l  j=l   ij 


'••■'  ^Icl'---'  ^Icn.   '■■■' 

1  c 


=  (Zi  Z2)'  ?  =  (^lll'-'  ^llnj^ 

T 
^2cl'-'  ^2cn2j  '  ^  =  I^T'  -  =  ^'^1'  '^2)'^  ^"^  5(^)  = 

DiagfA-,  Ic ,  A2  loc)'  ^*  ^®  clear  that  (2.2.8)  is  a  special 

c 
case  of  (2.2.1)  where  in  the  above  n.   =  T]  n .  ■ ,     i  =  l,2 

'•    j=l  '-^ 
2  2    c 

n  •  =  X)n--,  j  =  1,...,  c  and  n^p  =  J2       IT  "  ■  i  =  total  numb< 
'•^         i  =  l   -^  i  =  l  j  =  l   -J 

of  observations.   We  want  to  estimate  fi^-fi^    which  is  a 


special  case  of  C(b,  v)  with  S  =  (1  -  1)  and  T  =  0  . 

Next  consider  the  animal  breeding  experiment.   Suppose 
b  breeds  are  randomly  selected  and  d  different  diets  are 
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used.   Once  again,  one  can  use  the  linear  model  given  by 
(2.2.8)  for  inferential  purpose.   In  this  case  one  may  be 
interested  in  predicting  some  suitable  linear  functions  of 
random  quantities,  c.  and  7;  p  known  as  the  breeding 
values.   These  predicted  breeding  values  can  be  used  as  a 
selection  index  for  selecting  the  most  suitable  breeds  for 
future  breeding  purpose. 

As  a  concrete  example  in  animal  breeding  we  will 
discuss  the  example  considered  by  Harville  (in  press)  which 
involves  prediction  of  the  average  birth  weights  of  an 
infinite  number  of  single-birth  male  lambs  that  are 
offspring  of  different  sires  in  different  population  lines. 
The  data  consist  of  the  weights  (at  birth)  of  62  single- 
birth  male  lambs,  and  came  from  five  distinct  population 
lines.   Each  lamb  was  the  progeny  of  one  of  23  rams,  and 
each  lamb  had  a  different  dam.   Age  of  the  dam  was  recorded 
as  belonging  to  one  of  three  categories,  numbered  1  (1-2 
years)  ,  2  (2-3  years)  ,  and  3  (over  3  years)  .   Let  Y-  • .  i 
represent  the  weight  (at  birth)  of  the  d    of  those  lambs 
that  are  the  offspring  of  the  k    sire  in  the  j 
population  line  and  a  dam  belonging  to  the  i    age 
category.   Following  Harville  (in  press),  we  will  use  the 
mixed  linear  model 

Yijkd  =    t^   +   h    +    ^j    +    ^j\c    +    «ijkd  (2.2.9) 
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where  d  =  1,...,  n^^i^?  ^    =  l»-»  "i  j »  i  =  1,  2,  3  and  j  = 
1,...,  5  where  n-  ..   is  the  number  of  lajnbs  whose  dams  belong 
"to  the  i    age  category  when  the  population  line  is  j  and 
the  sire  is  k  and  m-  is  the  total  number  of  lambs  whose 
sires  are  from  population  line  j.   Here  the  age  effects 
(^l  ,  6^,    (5o)  and  line  effects  (t-.,...,  jtc)  are  considered  as 
fixed  effects  and  the  sire  (within  line)  effects  s-.  are 
iid  N(0,  (rA)~  J  and  independent  of  error  variables  e ■  •  ■  j 
which  are  iid  N(0,  r~  ) .   To  make  the  design  matrix 


assoc 1 


ated  with  fixed  effects  full  rank  we  can  take  i§q  =  0 


3 


=  Wc.    which  is  the  usual  formulation  needed  for  GLM 
Procedures  in  SAS . 


Le 


t  /i  •  •  =  E(Y.  -1  j)  =  /i  +  6  •  +  T.  and  let  there  be  n- 
observations  corresponding  to  the  i    age  category.   We 
will  be  interested  in  predicting 

_   1   3 


w  . ,   =  =T^  y^      u  ■  -n  ■  +  s  . , 

jk    n-p  .^^  ^ij  1  .  .  ^   jk 


'^  ^  ^  i?i  ""i--^^  ^  ""j  ^  ^J' 


(2.2.10) 


where  Urr.    -     Y^  n  •       .       The  value  of  w.,   can  be  interpreted  as 
1    .^^  1.  .  Jk 

the  average  birth  weight  of  an  infinite  number  of  male 
lajnbs  that  are  offspring  of  the  k    sire  in  the  j     line. 
Clearly,  this  is  a  special  case  of  the  general  problem  we 
are  interested  in  with  appropriately  defined  b,  v,  X,  Z,  S 
and  T . 
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2 .3   Hierarchical  Bayes  Analysis 
In  this  section,  for  the  finite  population  sampling  we 
provide  the  predictive  distribution  of  Y     given  Y     = 
y^  ^  and  for  the  infinite  population  set  up  we  provide  the 
posterior  distribution  of  the  vector  of  effects  (b  ,  v  ) 
given  Y  =  y. 

We  will  use  the  following  notations  to  label  certain 
distributions  to  be  used  in  this  section.   A  random 
variable  Z  is  said  to  have  a  gamma(a,  /?)  distribution  if  it 
has  pdf 


f(z) 


=  [exp(-az)z^"^a^/r(^)]l^^>0-,.  (2.3.1) 


A  random  vector  T  =  (T^,...,  Tp)'^  is  said  to  have  a 
multivariate  t-d  istr  i  but  ion  with  location  parameter  /j  , 
scale  parameter  $,  a  p.d.  pxp  matrix  and  degrees  of  freedom 
(d.f .)  V    if  it  has  pdf 

g(t)  oc  \^\\v    +  (t  -  M)'^f"^(t  -  /i)p^'"^^\         (2.3.2) 

(see  Zellner,  1971,  p.  383,  or  Press,  1972,  p.  136).   Here 
|E|  denotes  the  determinant  of  a  square  matrix  E.   Assume 
u    >    1.       Then  E(T)  =  m,  V(T)  =  (/// (i/-2)  )f . 

We  assume  conditions  (A)  and  (B)  given  at  the 
beginning  of  Section  2.2.   In  stage  (C)  of  the  model,  it  is 
assumed  that 
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(CI)  B,  R,  A^R,...,  A^R  are  independently  distributed 
with  B  ~  uniform(R  ),  R  ~  gajnmaf  laQ ,  ^Sq)  ,     a-Q  > 
0,  go  >  0,  A.R  ~  gamma^laj,  igj  with  a^  >  0,  gj 
>  0  (i  =  1,...,  t)  . 
Allowing  aQ  and  some  of  g.  to  be  zero,  some  improper  gamma 

distributions  are  included  as  a  possibility  in  our  prior. 

(2) 
Before  stating  the  predictive  distribution  of  Y 

given  Y     =  y    »  ^^  need  to  introduce  a  few  matrix 


notations.   We  write  E  =  E(A)  =  *  +  Zp(A)Z  ,  and  partition 


S  into  S  = 


V    y 

-11  ^12 

^21  -22 


Also,  let 


-22.1  -  -22    -21-11^12' 


,-1   ^-lv(lVv(l)Tv-lv(l)r^v(l)Tv-l 


K  =  EI\-EI\X^'\X^^^'^1\X^'^)     X-^-E 


^^-ll 


« =  E2,K  +  x(2'(x(^)Vix^'y'x^''Vi< 


(2.3.3) 
(2.3.4) 
(2.3.5) 


G  =  E 


(2) 


-  E..E;ix(^)Yx(^)Vix(^y' 


22.1-(x^^^-?21?ll-^^^0(^ 

X  (X(^)  -  .,,.-X^^)) 


^11^ 
T 


(2.3.6) 


Now  the  predictive  distribution  of  Y     given  Y 


(1) 


y   '  is  given  in  the  following  theorem  in  two  steps.   A 
proof  of  this  theorem  is  deferred  to  Appendix  A. 
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Theorem  2.3.1.   Consider  the  model  given  in  (2.2.2)  and 

t 
(CI).     Assume  that,  n^p  +  ^ZSi  "  P  >  2.   Then,  conditional 

i=0 

on  A  =  A  and  Y     =y    ,  Y     has  multivariate  t- 


distribution  with  d.f .  n^p  +  1^  g:  -  Pj  location  parameter 

i=0 

(1)  /  ^  V^ 

My  ,    and    scale    parameter    I  nrp    +     13  S;  ~    p|         ^ 

V      i=0  '     / 


t  (l)T   (1) 

^0  +  s  ^i^i  +  y       ^y 

i  =  l 


distribution  of  A  given  Y 


G.   Also,  the  conditional 

(1)    ..(1) 


=  y 


has  pdf 


(1) 


f(A|y^'^)  a  |?,,| 


X^'^Vl^^'^ 


n  A?  ' 

i  =  l 


a-0  +  E  a-i-^i  +  y        Ky 

i  =  l 


(l)T^  (l)T^K+i^0^i-p) 


(2.3.7) 


Using  the  moments  of  a  multivariate  t-d istr ibut ion  and 
the  iterated  formulas  for  conditional  expectations  and 
variances  it  follows  from  the  above  theorem  that 


El 


{x'-^hy^')   =  E{my<-'y^: 


(2.3.8) 


v(Y(^>|y(^>) 


=  v(E(y(^)|A.  y^'y^)  .   e(v(y<2)|A.  /'')|y<^') 
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(1),..(1) 


v(My  -  -  |y 


)  +  (n^  .  .|gi  - 


-1 


X  E 


[.„  .  i^.,.,  .  yJ^^W^} 


p  -  21 

(1)' 


(2.3.9) 


Using  (2.3.8)  and  (2.3.9),  it  is  possible  to  find  the 
posterior  mean  and  variance  of  ^  =  ^Y    ,  Y    j  = 
AY     +  CY     where  A  and  C  are  known  matrices.   The  Bayes 
estimate  of  ^(Y    ,  Y    j  under  any  quadratic  loss  is  its 
posterior  mean,  and  is  given  by 


SBFI 


(1)> 


(y^^^)  =  [a  +  CE(M|/^^)Jy 


(1) 


(2.3.10) 


using  (2.3.8).   Similarly,  using  (2.3.9),  one  may  obtain 


{(y^'\  Y^^^)/^^    =  cv(y^^^|/^^)c 


(2),.  (l)x  T 


(2.3.11) 


(1)   ^(2)> 


mT  mT        /(i)     (^)\ 

Note  that  when  A  =   ®  lA .  and  C  =   ®  1m  _„  ,  ^(Y^  \    Y^    ^ 

1  =  1""  1      ~    i  =  l  '^i  "i   -^  ^ 

reduces  to  the  vector  of  finite  population  totals  for  the  m 

small  areas,  whereas  for  the  choice  A  =   0  (in./N:)  and  C  = 

i  =  l\~  1'  ^' 

.®  (In  -n  /^i)'    ^(Y    '  Y    )  reduces  to  the  vector  of  finite 
population  means  for  the  m  small  areas. 

Now  we  will  get  back  to  the  infinite  population  set  up 

T    T  ^ 
to  provide  the  posterior  distribution  of  W  =  (b  ,  y  ) 

given  Y  =  y  which  will  be  used  to  find  the  posterior  mean 

and  variance  of  C(b,  v) ,  a  vector  of  the  linear  combination 

of  W.   The  posterior  distribution  is  given  in  the  following 
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theorem.  A  proof  of  this  theorem  will  be  omitted  because 
of  its  similarity  to  the  proof  of  Theorem  2.3.1.  We  will 
consider  the  model  given  by  (A)  and  (B)  of  Section  2.2  and 
(CI).  Recall  from  the  middle  of  Section  2.2  that  we  have 
redefined  the  dimensions  of  Y,  X,  Z  and  e  appearing  there 
t>y  Y(nyxl),  X(nrpXp),  Z(nrj,xq)  and  e(nrpxl).  Also  we  have 
assumed  rank(X)  =  p.   Now  we  will  state  the  theorem. 

Theorem  2.3.2.   Consider  the  model  stated  above  and  assume 

t 
that  nrp  +  ^  g.  -  p  >  2.   Then,  conditional  on  A  =  A  and 

i=0  ^ 
Y  =  y,  W  has  multivariate  t-d istribut ion  with  d.f . 


nrp  +  5Z  S  •  ~  P »  location  parameter  Py ,  scale  parameter 

i=0  ^ 

.  -1 


"^   "  So^'  -  ' 


^0  +    T.  a-jA.  +  y  Qy 
i  =  l 


H,  where 


q  =  E"^  -  S"^x(x'^E~lx)"^x'^E"^; 


(2.3.12) 


=   E~^x(x'^E~lx)  "^ 


QZD 


(2.3.13) 


H  = 


-DZ 


(x'^E^lx)        -(x'^E~^x)  X'^E"^ZD 
^E'^xfx'^E'^x)       p  -  DZ'^QZD 


(2.3.14) 


Also,  the  conditional  distribution  of  A  given  Y  =  Y  has 


pdf 
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2vAv~lv   2 


f(A|y)  oc  |E|  %^^-h 


n  ^i  ' 

i  =  l 


i  =  l 


-^nT+JoSi-p) 


(2.3.15) 


Again  using  the  moments  of  multivariate  t-d istribut ion ,  and 
the  iterated  formulas  for  expectation  and  variance,  the 
above  theorem  can  be  used  to  find  the  computational 
formulas  for  E(W|y)  and  V(W|y)  as  in  (2.3.8)  and  (2.3.9). 
Similarly,  one  can  find  EK(b,  Y)|yj  =  egj(y)  (say)  and 
V(c(b,  Y)|y)  as  in  (2.3.10)  and  (2.3.11)  where  C(b,  y)  = 
§b  +  TY  ^or  known  matrices  S(uxp)  and  T(uxq). 

Applications  of  these  two  theorems  will  be  considered 
in  Section  2.4  to  some  actual  data  sets.   There  we  will 
carry  out  an  HB  analysis  of  the  data  sets  which  appeared  in 
Battese  et  al .  (1988)  and  Harville  (in  press). 

Before  we  conclude  this  section,  we  will  make  a  final 
observation.   A  comparison  of  (2.3.4)  and  (2.3.7)  with 
(2.3.12)  and  (2.3.15)  reveals  that  if  we  replace  y  by  y    , 
X  by  X*-  -^  and  E  by  E^^    in  f(A|y)  in  (2.3.15)  we  obtain 
^(^|y    )  a-s  given  in  (2.3.7).   This  observation  will  be 
referred  to  in  Section  2.4. 
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2 . 4    Applications  of  Hierarchical  Bayes  Analysis 
This  section  concerns  the  analysis  of    two  real  data 
sets  using  the  HB  procedures  suggested  in  Section  2.3.   The 
first  data  set  is  related  to  the  prediction  of  corn  and 
soybeans  for  12  counties  in  north-central  Iowa  based  on 
1978  June  Enumerat ive  Survey  as  well  as  LANDSAT  satellite 
data.   It  appeared  in  Battese ,  Harter  and  Fuller  (BHF)  who 
conducted  a  variance  components  analysis  for  this  problem. 
The  second  data  set  originally  appeared  in  Harville  and 
Fenech  (1985)  and  reappeared  in  Harville  (in  press)  where 
he  conducted  a  variance  components  as  well  as  an  HB 
analysis  to  predict  vv-,  ,  given  in  (2.2.10),  the  average 
weight  of  an  infinite  number  of  single-birth  male  lambs 
that  are  offspring  of  the  k    sire  in  the  j    population 
1  i  ne  . 

We  will  first  consider  the  BHF  data  set.   To  start 
with,  we  briefly  give  a  background  of  this  problem.   The 
USDA  Statistical  Reporting  Service  field  staff  determined 
the  area  of  corn  and  soybeans  in  37  sample  segments  (each 
segment  about  250  hectares)  of  12  counties  in  north-central 
Iowa  by  interviewing  farm  operators.   Based  on  LANDSAT 
readings  obtained  during  August  and  September  1978,  USDA 
procedures  were  used  to  classify  the  crop  cover  for  all 
pixels  (a  term  for  "picture  element"  about  .45  hectares)  in 
the  12  counties.   The  number  of  segments  in  each  county, 
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the  number  oi^  hectares  of  corn  and  soybeans  (as  reported  in 
the  June  Enumerative  Survey),  the  number  of    pixels 
classified  as  corn  and  soybeans  for  each  sample  segment, 
and  the  county  mean  number  of  pixels  classified  as  corn  and 
soybeans  (the  total  number  of  pixels  classified  as  that 
crop  divided  by  the  number  of  segments  in  that  county)  are 
reported  in  Table  1  of  BHF.   For  ready  reference,  it  is 
reproduced  in  Table  2.1.   In  order  to  make  our  results 
comparable  to  that  of  BHF,  the  second  segment  in  Hardin 
county  was  ignored. 

The  model  considered  by  BHF  is 

Yij  =  bo  +  bjx^.j  +  b2X2ij  +  V.  +  e.j,  (2.4.1) 

where  i  is  a  subscript  for  the  county,  and  j  is  a  subscript 

for  a  segment  within  the  given  county  (j  =  1  ,  ...,  N-,  the 

number  of  segments  in  the  i    county,  i  =  1,  ...,  12).   Here 

Xw  •  •  is  the  number  of  pixels  of  corn  and  x.-,  •  •  is  the  number 
■L  1 J  ■^  1 J 

of  pixels  of  soybeans  for  the  j    segment  in  the  i 

county.   They  assumed  (in  our  notations)  E(v.)  =  E(e. •) 

=  0,  V(v.)  =  (Ar)"\  V(eij)  =  r'^ ,    Cov(v.,  e^^)  =  0, 

Cov(v.,  V.,)  =  0  (i  ^  i'),  Cov(e.j,  e.,.,)  =  0  if 

(i,  j)  7^  (i  ,  j  ).   As  in  BHF,  we  are  interested  in 

N. 
predicting  Y-  =  N.   ^  Y. •,  the  finite  population  mean  for 
'     ^  j=l  '-^ 

the  i    county  both  for  corn  and  soybeans.   Using  (2.4.1), 


Table  2.1  Survey  and  Satellite  Data  for  Corn  and  Soybeans 
in  12  Iowa  Counties 
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N 

lo .  o"f  pixe] 

Ls  Mean 

no  .  of 

No. 

of 

Reported 

in  ! 

sample 

pixe 

Is  per 

Segments 
ijnple  Count; 

hec 

tares 

segments 

segment* 

County     Sj 

/  Corn 

Soybean 

Corn 

Soybean  Corn 

Soybean 

Cerro  Gordo 

1 

545 

165.76 

8.09 

374 

55 

295.29 

189.70 

Hami Iton 

1 

566 

96.32 

106.03 

209 

218 

300.40 

196.65 

Worth 

1 

394 

76.08 

103.60 

253 

250 

289.60 

205.28 

Humboldt 

2 

424 

185.35 
116.43 

6.47 
63.82 

432 
367 

96 

178 

290.74 

220 . 22 

Frankl in 

3 

564 

162.08 
152.04 
161.75 

43.50 
71.43 
42.49 

361 
288 
369 

137 
206 
165 

318.21 

188.06 

Pocahontas 

3 

570 

92.88 

149.94 

64.75 

105.26 

76.49 

174.34 

206 
316 
145 

218 
221 
338 

257.17 

247.13 

Winnebago 

3 

402 

127.07 

133.55 

77.70 

95.67 
76.57 
93.48 

355 
295 
223 

128 
147 
204 

291.77 

185.37 

Wright 

3 

567 

206.39 
108.33 

118.17 

37.84 
131 .12 
124.44 

459 
290 
307 

77 
217 
258 

301.26 

221.36 

Webster 

4 

687 

99.96 
140.43 

98.95 
131.04 

144.15 

103.60 

88.59 

115.58 

252 
293 
206 
302 

303 
221 
222 

274 

262.17 

247.09 

Hancock 

5 

569 

114.12 
100.60 
127.88 
116.90 
87.41 

99.15 
124.56 
110.88 
109.14 
143.66 

313 
246 
353 
271 
237 

190 
270 
172 
228 
297 

314.28 

198.66 

Kossuth 

5 

965 

93.48 
121 .00 
109.91 
122.66 
104.21 

91.05 
132.33 
143.14 
104.13 

118.57 

221 
369 
343 
342 
294 

167 
191 
249 
182 
179 

298 . 65 

204.61 

Hardin 

6 

556 

88.59 

88.59 

165.35 

104.00 

88.63 

153.70 

102.59 
29.46 
69.28 
99.15 

143.66 
94.49 

220 
340 
355 
261 
187 
350 

262 

87 
160 
221 
345 
190 

325.99 

177.05 

The  mean  number  of  pixels  of  a  given  crop  per  segment  in  a 
county  is  the  total  number  of  pixels  classified  as  that  crop, 
divided  by  the  number  of  segments  in  that  county. 


40 

N. 

—  1   ^ 
Y.  can  be  written  as  Y^  =  /i^  +  e^  where  e^  =  Nj  Yi   ^ij'  /^i  - 

^0  +  ^l^li(p)  +  ^2^2i(p)  +  ^i'  ^li(p)  =  ^i    jSi'^liJ  ^""^ 

x„ .  ,  N  =  N~  y  x^ •  •.   Under  the  assumptions  of  model 
2i(p)     1  j^^  2ij 

(2.4.1),  /i  •  can  be  interpreted  as  the  conditional  mean  of 

the  hectares  of  corn  (or  soybeans)  per  segment,  given  the 

realized  county  effect  v.  and  the  values  of  the  satellite 

data.   Clearly,  the  mean  Y.  is  not  equivalent  to  ^., 

because  the  average  of  the  e- .  over  the  finite  population 

of  segments  in  county  i  is  not  identically  0.   However, 

either  if  N.  are  large  or  if  the  sajnpling  rates  n^/Nj 

(i  =  1,...,  m)  are  small,  then  the  predictor  of  /i-  is  an 

appropriate  predictor  of  Y-.   In  this  example,  either  of 

the  conditions  appears  to  be  true.   For  predicting  Yp 

first  assuming  A  and  r  known,  BHF  obtained  BLUPs  of  ^^ 

(i  =  1,  ...,  12).   Then,  using  Henderson's  Method  III,  they 

obtained  estimates  of  the  variance  components,  and  the 

final  predictors  involved  the  estimated  variance 

components.   Henderson's  method  being  an  ANOVA  method  could 

lead  to  negative  estimates  of  A"  •   If  this  were  the  case, 

BHF  set  it  equal  to  zero.   This  phenomenon  is  likely  to 

happen,  particularly  when  the  number  of  small  areas  or 

strata  is  small. 

We  use  instead  Theorem  2.3.1  and  Theorem  2.3.2 

respectively  to  find  the  first  two  posterior  moments  of  Y- 
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and  H-.        In  the  special  case  of  nested  error  regression 
model,  we  will  now  develop  the  expressions  for  the 
posterior  distribution  given  by  (2.3.7)  and  the  expressions 
for  the  posterior  means  and  variances  of  Y.  and  /i..   Here, 
we  have  t  =  1,  A-,^  =  A ,  D(A)  =  \~^lm,    *  =  In  •   Then  E^j^  = 

.®  (in.  +  ^~'^Jn.)  so  that  |Ej^|  =  .11  {(A  +  nj)/Aj.   Also, 

1  "i 
writing  x^^^^  =  "i   E^ij  (^  =  I'-'  >")  where  x-j  = 

(1  x-|^  ^  .  X2-  ■)     ,    one  gets 


y(l)T^_l^(l) 


T     ^m   „2/„   ,  xN-l.-,     c-,T 


=  eT  ,E.^,x.  .xi  .  -  ET  .nf  (n.  +  A)   x.  .  .x  t  .  . 
^1=1  j=l~ij~ij     1  =  1  1 '^  1     ■'       -i(s)~i(s) 

=  H(A)  (say) .  (2.4.2) 


-1  "i 
Next  writir-  -"'•      _  -  J-^  i 


"S  yi(3)  =  nT  Sj^iYij,  one  gets 


/1)\,(^) 


o 

=  E"?  .E^^fy.  .  -  y.  ,  x)   +  AE?  .n.(n.  +  A)~^y?.  . 
1  =  1  j  =  l\-^ij    "^  1  (s)/       1  =  1  1^  1     ^   -^1(3) 

T 

-  {^i=i^jii^ij(yij  -  "i("i  +  ^)'Vi(3))} 
X  H-'(A){^T=i^jii^ij(yij  -  "i("i  +  ^)"'yi(s))} 

=  Qq(X)     (say).  (2.4.3) 

e  conditional  pdf  f  ( A | y    j  given  in  (2.3.7)  simplifies  to 
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f(A|y^^^)  oc  A^'   *"!'   n  (A  +  n.)   H(A) 


(aQ  +  a-^A  +  Qq(A)) 


-i(n^+gQ+g^-p) 


(2.4.4) 


-1  N 
Next  writing  f^  =  (N-  -  n.)/N.,  x^  =  {n  .^-n  .^)      Sj^^.^.x-^,  the 

posterior  means,  variances  and  covariances  of    the  i^inite 

population  means  are  given  by 


E 


1    N. 
Nl'^j^l^ij 


y 


(1) 


=    (1    -    ^i)yi(s)    +    ^iEnjCni    +    A)      y.^^^ 


-1 


y 


(1) 


+   f  .E 


T 

{st    -    n^Cn.    +    A)-'x.(^^}    n-\x) 


X    {eT^^e"!,^.  .(y, 


njCn^    +    A)      yi(3)^ 


y 


(1) 


=   e^Q       (say) ; 


(2.4.5) 


-1    Nj 


y 


(1) 


f?v 


ni(ni+A)'   y.(^)    +    {xt-n.(n.+A)      x.^^^} 


X    H-^(A)|E'P^iE"iiX.j(yij-n.(ni+A)-Vi(3))} 


y 


(1) 


-1 


+    (n'r+go+gi-P-2)      -fiE 


{aQ+a-^A+QQ(A)} 
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(  —1  \ 

X  |(Ni+A)(N.(ni+A))    +  ^i(xt-ni  (n  j+A)  "^  .  ^^^j 
X  H-l(A)(x*-n.(n.+A)-'x.(^))| 


=  V^  +  V2   (say) 


=  Sjjg   (say)  ; 


(2.4.6) 


and  for  i  9^  k , 


Gov 


-1  ^i        _i  N. 


y 


(1) 


f.fl^Cov 


ni(ni+A)   y.^^^  +  {xl;-n.(n.+A)   x.^.^^} 


X  H 


-1 


(A){i:T=,E"i,x.j(y..-n.(n.+A)-Vi(,))}, 


T 

"k("k+^)'^yk(s)  +  {5k-"k("k+^)'^?k(s)}  H'^A) 


X  {^k=i^j=i^kj(ykj-"k("k+^)~Vk(,))} 


(1) 


-1 


+  (n^+gQ+g^-p-2)   f^f^^ 


(aQ+a^A+QQ(A)) 


X  (xt-n.(n.+A)"'x.(^))  H"!  (A)(x*-ni^  (nj^+A)  "S^^^^)  y 


(1) 


(2.4.7) 


Although  we  have  provided  the  expressions  for  "the 
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covariances  which  may  be  necessary  for  providing 
simultaneous  confidence  set  for  the  finite  population  mean 
vector,  we  will  not  use  it  here. 

Before  we  find  the  posterior  means  and  variances  of  fi  ■ 
in  the  infinite  population  set  up,  we  give  a  general 
discussion  comparing  HB  predictors  with  EB  predictors. 
Writ  i  ng 


N. 


T-1 


NT^  T   Y.  . 


A,  y 


(1) 


(1  -  ^i)yi(s)  +  ^i"i("i  +  ^)"'^i(s) 

'  {i?i  jli-ij(^iJ  ~  "i^"i  ^  '^"'^i(s); 


=  Si (A)   (say) , 


(2.4.8) 


and 


11  ■ 

ntI  yf  Y.  . 
1  .^.  1 1 


j=i 


A,  y 


(1) 


(n^  +  gQ  +  g^  -  p  -  2)  ^fi{a.Q 


+  a.A  + 


Qo(^)} 


{I 
(Nj  +  A)(N.(n.  +  A))-'  +  f.(xt  -  n.(n.  +  A)-lx.^^)) 

X  H-l(A)(xt  -  n.(n.  +  A)-lx.(^))| 
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=  g2('^)   (say)  , 


(2.4.9) 


we  have  from  (2.4.5),  (2.4.6),  (2.4.8)  and  (2.4.9)  that 


«HB  =  E 


gi(^)k 


(1)' 


(2.4.10) 


Vi  =  v[gi(A)|y*^^^], 


(2.4.11) 


and 


V2  =  E 


g2(^)k 


(ly 


(2.4.12) 


In  EB  analysis,  to  obtain  the  EB  predictor,  we  usually 
replace  Eg-|(A)|y      by  gj(A)  =  e^g  (say)  where  A  is  some 
estimate  of  A,  which  can  be  ML,  REML  or  ANOVA  estimate  and 
we  report  a  naive  measure  of  posterior  variance  by  §2 (A)  = 
Spp  (say)  .   Usually,  the  point  estimates  eirn  and  e^g  are 
not  too  far  apart.   But  to  measure  the  posterior  variance 
®HR  ^y  ®FR '  ^^  niay  underestimate  the  actual  measure  because 
of  the  failure  to  account  for  the  estimation  of  A.   We  may 
grossly  underestimate  the  actual  measure  if  gi(A)  varies 
too  much  within  the  body  of  the  posterior  distribution  of  A 
as  in  this  case  V^  will  be  significantly  large.   We  will 
see  in  this  example  that  for  some  of  the  counties  Vh 
contributes  a  significant  portion  of  the  total  variance. 
We  will  also  find  that  the  percent  contribution  of  V-,  to 
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2 
Sup  usually  increases  with  the  relative  difference  between 

Now  to  develop  the  expressions  "for  the  posterior  means 
and  variances  for  /i  •  we  use  Theorem  2.3.2.   Note  that  by  an 
observation  made  at  the  end  of  Section  2.3,  f (A|y) ,  the 
posterior  distribution  of  A  given  y,  is  given  by  (2.4.4). 
After  considerable  simplifications  in  this  particular  case, 
we  obtain 


E[;..|y]   =    EJ^njCn.    +    A)-ly.(^Jy] 


+    E 


{5i(p)  -  "i("i  +  ^)"'Si(s)r«"'(^) 


n 
m  1 


.E      .Ex.j(y..    -    n.(n.    +    X)--'?,^,;^ 


y 


=    ejjg       (say)  ; 


(2.4.13) 


V[/ii|y]  =   V 


n.(n.    +    A)"-'^y.^    s    +    (x .  ^    ^    -    n.(n.    +    A)~-'^x..    xj 
1^     1  ^  1  (s)  l-i(p)  i'^     1    ^      -'       -i(s)J 


n 
m  1 


H-1(A)|.E       E5<ij(yij    -    "iCni    +    A)-ly.(^); 


y 


+   ("t  +  So  +  Si  -  p  -  2) 


-1. 


{a-O    +    a-iA    +    Qo(A)} 


|(n.    +    A)-l    +    (x.^p)    -    ni(ni    +    A)-!^.^^^^ 


X    H-l(A)(x.(p)    -    ni(ni    +    ^VH ,  ^^^] 
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=  s*§   (say);  (2.4.14) 

T 
where  x.^^^  =  (  1,  x^.^p^,  X2i(p))  ,  H(A)  as  given  in 

(2.4.2)  and  Qq(A)  as  given  in  (2.4.3).   Note  that  since  f. 

—*    1  and  x./  N  -  X*  — ►  0  as  N-  — ►  oo ,  it  can  be  seen 
-i(p)    -1     -      1 

ini^ormally  that  the  rhs  of  (2.4.5)  and  (2.4.6)  approach  in 
limit  (2.4.13)  and  (2.4.14),  respectively. 

We  will  now  get  back  to  the  actual  data  analysis  of 
the  BHF  data  set  given  in  Table  2.1.   We  use  formulas 
(2.4.5),  (2.4.6),  (2.4.8),  (2.4.9),  (2.4.13)  and  (2.4.14) 
to  obtain  HB  and  EB  posterior  means  and  variances  of  the 
population  means  for  the  12  counties.   Our  HB  approach 
eliminates  the  possibility  of  obtaining  zero  estimates  of 
the  variance  components.   A  number  of  different  priors  for 
R  and  RA  were  tried;  both  informative  and  non inf ormat i ve . 
The  results  for  the  posterior  means  were  quite  similar 
whereas  the  posterior  variances  varied  approximately  by  as 
much  as  lO/o.   For  illustration  purpose,  we  have  decided  to 
report  our  analysis  for  the  prior  with  aQ  =  0.005,  Sq    =    0, 
an  =  0.005  and  g-,  =  0.   But  since  the  choice  of  a^  =  0 
gives  improper  posterior  distribution  of  A,  we  took  a^  a 
small  positive  number.   Table  2.2  provides  the  HB 
predictors  Sud  »  "the  EB  predictors  Sgn ,  the  BHF  predictors 
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Table  2.2  The  Predicted  Hectares  of  Corn  and  Associated 
Standard  Errors 


aQ  =  .005 

go   =  0 

^1  = 

.005 

Si  = 

=  0 

County 

^HB 

^EB 

^BHF 

®HB 

«EB 

®BHF 

Cerro  Gordo 

122.1 

122.2 

122.2 

9.3 

9.4 

10.3 

Frankl in 

143.6 

144.2 

145.3 

6.9 

6.4 

6.7 

Hami Iton 

126.2 

126.2 

126.5 

9.2 

9.3 

10.1 

Hancock 

124.6 

124.4 

124.2 

5.3 

5.3 

5.5 

Hard  in 

142.6 

143.0 

143.5 

5.8 

5.6 

5.8 

Humboldt 

108.9 

108.5 

107.7 

8.2 

7.9 

8.4 

Kossuth 

107.7 

106.9 

106.1 

5.8 

5.2 

5.4 

Pocahontas 

111  .8 

112.1 

112.9 

6.6 

6.4 

6.8 

Webster 

114.9 

115.3 

116.0 

5.9 

5.7 

6.0 

Winnebago 

113.3 

112.8 

112.1 

6.6 

6.4 

6.8 

Worth 

107.1 

106.8 

105.6 

9.9 

9.1 

10.0 

Wright 

122.0 

122.0 

122.1 

6.4 

6.5 

6.9 
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epup  and  the  respective  associated  standard  errors  Sttd  »  ^pn 
and  Spup  for  the  corn  data.   Table  2.3  provides  the  values 
of  Sup,  ®FR »  ^BHF  ^"*^  ^HB  ^°^  the  soybeans  data  for  the 
s£une  choice  of  prior  hyperparameters ,  whereas  Table  2.4 
provides  their  respective  standard  errors  along  with  the 
components  V-,  and  Vq  of  Sug  •   Values  of  egiip  and  Sgup 
presented  in  Tables  2.2  -  2.4  are  computed  using  FORTRAN 
from  the  formulas  given  in  the  BHF  paper  and  are  slightly 
different  from  the  values  reported  in  Battese  et  al . 
(1988).   From  Tables  2.2  and  2.3,  for  predicting  corn  and 
soybeans,  one  can  see  that  eiig  ,  ^ug »  ®EB  ^"*^  ^BHF  ^^^    quite 
close  to  each  other.   From  Tables  2.2  and  2.4,  Spp  and  Sj,q 
appear  to  be  smaller  than  Sgup •   But  since  Spg  is  naive  EB 
posterior  s.d. ,  it  is  probably  an  underestimate  of  the  true 
measure.   From  Tables  2.3  and  2.4  we  find  hardly  any 
difference  either  between  eup  and  ejlp  or  between  their 
standard  errors  Sitn  and  Siip  •   This  is  what  we  anticipated 
for  this  data.   To  draw  a  clear  comparison  between  HB  and 
EB  procedures,  we  added  one  extra  column  at  the  end  of 
Tables  2.3  and  2.4.   The  last  column  of  Table  2.3  measures 
the  percent  relative  difference  100  x  leiiD  -  ^eb|/^HB  ^ 
between  EB  and  HB  predicted  values  whereas  the  last  column 
of  Table  2.4  measures  the  percent  contribution  (100  x 

Vj^/(V-|  +  ¥2)%)  of  V^  towards  the  total  posteror  variance 

2 
Sop.   Comparison  of  these  two  columns  indicates  that  the 
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Table  2.3  The  Predicted  Hectares  ol^  Soybeans  Obtained  by 
Using  Different  Procedures 


aQ  =  .005 

So   =  0 

aj 

=  .005 

Si  = 

0 

County 

^HB 

^HB 

«EB 

«BHF    1 

-  ^eb/^hb 

X  1007c 

Cerro  Gordo 

78.8 

78.8 

78.2 

77.5 

0.78 

Frankl in 

67.1 

67.1 

65.9 

64.8 

1.80 

Hami Iton 

94.4 

94.4 

94.6 

96.0 

0.21 

Hancock 

100.4 

100.4 

100.8 

101.1 

0.40 

Hardin 

75.4 

75.4 

75.1 

74.9 

0.39 

Humboldt 

81.9 

82.0 

80.6 

79.2 

1.71 

Kossuth 

118.2 

118.2 

119.2 

120.2 

0.84 

Pocahontas 

113.9 

113.9 

113.7 

113.8 

0.18 

Webster 

110.0 

110.0 

109.7 

109.6 

0.37 

Winnebago 

97.3 

97.3 

98.0 

98.7 

0.72 

Worth 

87.8 

87.8 

87.2 

86.6 

0.68 

Wright 

111.9 

111  .9 

112.4 

112.9 

0.45 
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Table  2.4  The  Standard  Errors  Associated  with  Different 
Predictors  of  Hectares  of  Soybeans 

aQ  =  .005        So  =  °        ^1  =  -^^^  Si  =  0 

County       sj^g   sj^g   s^g  Sgj^p    V-,^     \r^      V^^/ (¥-^+¥2)  xl007c 

Cerro  Gordo  11.7  11.7  11.6  12.7   7.67  128.59       5.1 

Franklin      8.2   8.2   7.5  7.8  11.94   54.92  18.0 

Hamilton      11.2  11.2  11.4  12.4   1.97  123.61       1.6 

Hancock       6.2   6.3   6.1  6.3   1.35   37.59       3.4 

Hardin         6.5   6.5   6.5  6.6   0.37   41.84       0.9 

Humboldt      10.4  10.4   9.9  10.0  22.62   85.40  20.9 

Kossuth       6.6   6.7   6.0  6.2   7.99   36.23  18.1 

Pocahontas     7.5   7.5   7.5  7.9   0.06   55.98       0.1 

Webster       6.6   6.7  Q.6  6.8   0.64   43.51       1.5 

Winnebago      7.7   7.8   7.5  7.9   4.11   55.70       6.9 

Worth         11.1  11.1  11.1  12.1   4.06  118.17       3.3 

Wright         7.7   7.7   7,6  8.0   1.62   57.48       2.7 
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contribut: ion  of    V-,    usually  increases  with  the  relative 
difference.   In  particular,  -for  counties  Franklin,  Humboldt 
and  Kossuth  these  relative  differences  are  as  high  as 
1.80/0,  1.71%  and  .84%  and  the  corresponding  contributions 
of  V|  are  as  nonnegl igi ble  as  18.0%,  20.9%  and  18.1%.   This 
made  Spr>  much  smaller  than  Sud  for  these  counties.   So  if 
one  uses  a  naive  EB  or  estimated  BLUP  approach,  he  will 
tend  to  underestimate  the  mean  squared  error  (MSE)  of 
prediction.   One  should  note  that  though  BHF  used  estimated 
BLUP,  they  tried  to  account  for  the  uncertainty  involved  in 
the  estimation  of  X    in  their  approximations  of  MSE. 
Similar  approximations  of  MSE  of  prediction  have  been 
suggested  by  Kackar  and  Harville  (1984),  Prasad  and  Rao 
(1990)  and  Lahiri  and  Rao  (1990). 

Now  we  will  consider  the  lamb-weight  data  set  of 
Harville  (in  press).   The  background  of  the  data  set  is 
given  in  the  example  presented  at  the  end  of  Section  2.2. 
We  will  use  a  model  similar  to  the  one  given  in  (2.2.9)  to 
analyze  the  data  set.   There  we  assumed,  following  Harville 
(in  press),  the  population  line  effects  as  fixed.   For  the 
purpose  of  illustration  with  three  variance  components,  we 
will  assume  the  population  effects  as  random.   This  would 
have  been  appropriate  if  these  five  lines  were  randomly 
selected  from  a  large  number  of  populations  lines.   Also  to 
make  the  design  matrix  associated  with  the  fixed  effects 
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(age  o-f  dam)  of  full  column  rank,  we  will  write  ^i    +    6  ■    as 
fi..       Now  we  have  the  following  mixed  linear  model 

Yijkd  =  ^i  +  '^j  +  ^jk  +  ^ijkd  (2.4.15) 


d  =  1,...,  n^j^,  k  =  1,...,  mj,  i  =  1,  2,  3  and  j  =  1,...,  5 
where  n-  •,   and  m-  are  the  same  as  in  (2.2.9).   We  assume  w- 
are  iid  N^  0 ,  (rA-j^)"^^,  s.|^  are  i  id  wfo ,  (rA2)~  j  and  e^.j^^ 
are  iid  N(0,  r~  )  .   Moreover  tt  . ,  s.i^  and  ^ ;  ;  i,^^  are  assumed 
to  be  mutually  independent.   We  want  to  predict  w.,  given 
in  (2.2.10).   Using  (2.4.15),  we  will  rewrite  w .  j^  as 


1    3 

-  -J-  '^   "    "   -L  -   -^  «=■-  (2.4.16) 


^jk  =  nrj;  .E^  ".  ^/^^  +  n.  +   s^^ 


where  n-    and  Ur^  are  given  in  (2.2.10). 

We  will  carry  out  a  non i nf ormat i ve  Bayesian  analysis 

3  T 

using  a  uniform(R  )  prior  for  /i  =  (/i-i  ,  /^o '  ^'^)   and 

independent  gamma!  iaQ  ,  hSf))  ^     gamma(  ^a^  ,  ogi)  and 
gammalia^,  hso)    priors  for  R,  RA^  and  RA2  respectively. 
Using  Theorem  2.3.2,  ^  •  i^  being  a  linear  combination  of 
fixed  and  random  effects,  we  can  find  its  posterior  mean 

^(^ikly)  ~  ^HB  (®^y)  ^n<^  the  posterior  variance  ^(^ii^ly)  = 
SuD  (say).   Using  this  theorem  and  iterated  formulas  for 
mean  and  variance,  we  can  derive  expressions  for  ejlg  and 
Suo ,  similar  to  the  ones  given  by  (2.4.13)  and  (2.4.14)  for 
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the  BHF  example.   The  choice  aQ  =  a^  =  a.^    =    Sn    =  Si  =  So  — 
0  of  the  hyperparameters  gives  a  nonini'ormat i ve  prior  for 
the  variance  components.   But  the  choice  of  an  =  0  or 

a.^    =    0  will  give  an  improper  posterior  distribution  of 

T 
(A-|  ,  A^)  .   So  we  tried  several  combinations  of  these 

hyperparameters  which  are  small  positive  numbers.   Our 

findings  for  this  data  set,  provided  in  Table  2.5,  are  not 

different  from  the  BHF  data  set.   We  report  our  analysis 

for  aQ  =  0.0005,  a.,  =  0.05,  a2  =  0.01  and  gQ  =  g^  =  g^  —    0 

in  Table  2.6.   The  estimated  BLUPs  for  w-.^  and  Wj-z-  reported 

in  Harville  (in  press)  are  10.98  and  10.29  respectively, 

whereas  the  corresponding  values  we  obtained  using  a 

non informative  HB  analysis  are  11.0  and  10.4  respectively. 

The  agreement  between  the  two  sets  of  estimates  is 

remarkably  close  considering  the  fact  that  the  underlying 

models  (2.2.10)  and  (2.4.15)  are  not  identical. 

Harville  (in  press)  also  estimated  the  difference 

w^o  -  ^ezc    and  the  associated  MSE  of  prediction  by  using 

both  variance  components  approach  and  HB  approach.   The 

estimated  MSE  of  w^o  -  ^c^fi    in  naive  EBLUP  approach  was 

o 
given  by  (0.955)  ,  whereas  for  Kackar  and  Harville  (1984) 

approximation  it  was  (1.053)   and  for  Prasad  and  Rao  (1990) 

o 
approximation  it  was  (1.143)  .   For  HB  approach,  Harville 

(in  press)  used  a  uniform  prior  both  for  the  fixed  effects 

and  the  variance  components  associated  with  sire  effect 


Table  2.5  Birth  Weights  (in  pounds)  o"f  Lambs 
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Sire    Dam  Age    Weight 


Sire    Dam  Age    Weight 


Line  1 


1 

1 

6.2 

2 

1 

13.0 

3 

1 

9.5 
10.1 

11.4 

2 

11.8 

3 

12.9 
13.1 

4 

1 

10.4 

2 

8.5 

Line    2 

1 

3 

13.5 

2 

2 

10.1 

3 

11.0 
14.0 
15.5 

3 

1 

12.0 

4 

1 

11.5 

3 

10.8 

Line    3 

Line  4 


2 
3 

1 
2 


1 
3 
2 


9.0 

9.5 

12.6 

10.0 

10.1 

11  .7 

8.5 

8.8 

9.9 

10.9 

11  .0 

13.9 

11  .6 

13.0 

12.0 


1 

1 

9.2 

10.6 

10.6 

3 

7.7 
10.0 
11.2 

2 

1 

10.2 
10.9 

3 

1 

11  .7 

3 

9.9 

Line    5 

1 

1 

11.7 
12.6 

2 

1 

9.0 

3 

11.0 

3 

3 

9.0 
12.0 

4 

3 

9.9 

5 

2 

13.5 

6 

2 

10.9 

3 

5.9 

7 

2 

10.0 
12.7 

3 

13.2 
13.3 

8 

1 

10.7 
11.0 
12.5 

3 

9.0 
10.2 
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Table  2.6  Predicted  Birth  Weights  o-f  Lambs  and  Associated 
Standard  Errors 

aQ=.0005   gQ  =  0   aj^=.05   gj^  =  0   a2=.01   g2  =  0 


Line 


Si  re 


HB 


'HB 


1 
2 
3 
4 

1 
2 
3 

4 

1 
2 
3 
4 

1 
2 
3 

1 
2 
3 

4 
5 
6 

7 
8 


10.1 

0.90 

10.9 

0.86 

11.0 

0.62 

10.4 

0.80 

12.1 

0.88 

12.2 

0.71 

11.9 

0.87 

11  .7 

0.80 

10.8 

0.70 

10.8 

0.53 

11.3 

0.75 

11  .1 

0.82 

10.2 

0.62 

10.5 

0.80 

10.5 

0.79 

11.2 

0.70 

10.7 

0.70 

10.8 

0.70 

10.8 

0.76 

11.3 

0.77 

10.4 

0.74 

11.4 

0.65 

10.8 

0.59 
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and  error.   The  HB  estimate  of  ^13  -  ^55  ^"  this  case  is 
reported  as  0.69  and  the  posterior  s.d.  is  reported  as 
1.042.   The  corresponding  values  obtained  by  using  our 
approach  are  0.60  and  0.99  respectively. 

To  conclude  this  section,  we  can  recommend  from 
whatever  we  have  learned  from  the  analysis  of  these  data 
sets  that  the  non inf ormat i ve  HB  method  is  clearly  a  viable 
alternative  to  the  usual  EB  or  variance  components 
approach,  and  should  be  given  every  serious  consideration 
for  prediction  both  in  finite  population  sampling  and  in 
the  infinite  population  situation. 


2.5   Hierarchical  Bayes  Prediction  of  Finite  Population 
Mean  Vector  in  Absence  of  Unit  Level  Observations 

Sometimes  it  is  either  difficult  or  impossible  to 
obtain  information  at  the  unit  level  for  the  small  areas. 
In  this  section  we  will  derive  the  predictor  of  finite 
population  mean  vector  when  we  do  not  have  observations  in 
the  unit  level.   For  i  =  1,...,  m,  the  i    small  area  with 
N-  units,  we  assume  that  based  on  a  sample  of  size  n-  we 
know  only  the  sample  mean  y  •  /■     \    of  the  characteristic  of 
interest,  the  sample  mean  vector  x. ^  x  (pxl)  of  the 
auxiliary  variables.   Also  we  have  information  on  the 
population  mean  vector  x. ^  ^  (pxl)  of  the  auxiliary 
variables  of  the  N-  units  in  the  i    small  area  of  the 
population.   We  are  interested  in  predicting  the  finite 
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-       —   T  — 

population  mean  vector  (Y-.,...,  Ym)   =  7  where  7-  =  Y-  = 


NJ^E^Yjj  based  on  (y^(^^y...,    VmisyY    =   ^Cs)  ^^^y)  ' 

(?l(s)"--'  ^in(s)).   ^""^  (?l(p)'---'  ^m(p))  • 

Let  Y-  =  (Y.-j,...,  Y.»,  j  ,  i  =  l,-.-,  m .   Consider  the 
following  model. 

(i)   Conditional  on  t]  ■     (N-xl),  b  (pxl)  and  6 

Y.  ~  Nf?;.,  RT  Iw  ) ,  i  =  1»  .■,  m  independently, 
where  R-  are  known  sampling  variances, 
(ii)  Conditional  on  b  and  6,     r)  .     ~  NfX-b,  6~    !»,  ], 

i  =  1,...,  m  independently. 
(iii)B  and  A  are  independent  a  priori  with 
B  ~  uniform(R  )  and  A  ~  gammafia,  Ag). 
Combining  (i)  and  (ii)  we  have 

(i')   conditional  on  b  and  6,    Y^  ~  N^X^b,  (Rt1+,5"^)I^,  J, 

i  =  1,...,  m  independently. 
Carter  and  Rolph  (1974)  introduced  this  type  of  model 
and  Fay  and  Herriot  (1979)  considered  an  EB  approach  to 
this  problem  in  a  special  case  t]  •    =  ^-Im   and  assumed  in 
place  of  (ii)  that  conditional  on  b  and  8,    9-    ~  Nlxjb,  ^   )» 
i  =  1,...,  m  independently.   Subsequently,  in  place  of  (i') 
they  assumed  that  conditional  on  b  and  ^,  Y.  ~ 

N((x^b)l[^  ,  RT^Ijvj   +  <5   Jm  ),  i  =  1,...,  m  independently.   Fay 
^         i        i         i' 

and  Herriot  (1979)  put  a  uniform(R  )  prior  on  B.   However, 
instead  of  assigning  a  distribution  on  A,  the  prior 
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variance,  they  estimated  it  iteratively  by  applying 
generalized  least  squares  procedure  to  Y.^  \  ~  Nlx-b, 
RT^nT^  +  6~    ] ,     i  =  1,...,  m  independently  where  Y.^  >.  is  the 
sajnple  mean  based  on  n-  units.   They  estimated  E(Y.)  =  6-, 
i  =  1,...,  m  based  on  their  superpopulat ion  model  whereas  we 
are  interested  in  predicting  the  finite  population  mean  Y., 
i  =  1,...,  m  based  on  (i  )  and  (iii). 

Now  we  will  get  back  to  our  problem.   For  the  sake  of 
notational  simplicity,  we  will  assume  without  any  loss  of 
generality  that  the  sample  mean  y- /  \  is  based  on  the  first 

.-l-i 


n.  units  and  is  given  by  v.  /  x  =  n.   5"!  y  •  •  •   Now  define 

^  ^       J=l 

—  IT 
(N-  -  n.)     y^   x'  •.   We  have  also  defined  in  Section  2.4 

j=n.+l  ^ 
that  f.  =  (N.  -  n.)/N.,  i  =  1,...,  m.   Since  Y-  = 
(1  -  f  OY.  /  X  +  f -Y.  /  N  and  y-  /  \  is  known,  to  predict  the 
vector  7,  it  is  enough  to  find  the  predictor  of  (Y^  ^  \5  ••■, 
Y  ^  n)   =  a  (say) .   So  to  predict  the  vector  a,     it  is 
enough  to  find  the  predictive  distribution  of  a  given  the 
seunple  mean  vector  y  ^    >.     (say)  . 

For  any  quadratic  loss,  the  predictor  of  7-  is  given 

by 

K^il?(s))  =  (1  -  ^i)yi(s)  +  ^iE(«ily(s))  (2.5.1) 
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an 


d  its  posterior  variance  is  given  by 


Knl?(s))  =  ^iK"il?(s))' 


(2.5.2) 


Now    from    (i'),    given    b    and    6,    (Y-j^ /g\  >  ■•• »     'm(s)' 


,...,  Y^ 


Y^  y    X Y  /  n)   is  multivariate  normal  (MVN)  with  mean  ^ 

l(u)'   '   m(u)/ 

"^1'  ••'  '^2mj  ^'^^'^^  ^i  =  -i(s)-'  ^i+m  ~ 

§T(u)^'  '^?  =  K'  ^  ^"')Ai  ""^  '^f+m  =  (^l'  +  ^"')/(Ni  -  Hi), 
i  =  1,...,  m.   From  this,  it  is  easy  to  derive  that  given 


y.    .,     h    and  6,     a     is  MVN  with 
•i(s)'  - 


E 


("iiy(s)'  ^'  ^)  =  ?T(u)^5 

Cov(a.,  akly(s)'  ^'  0  =  "^f+m^ik 


(2.5.3) 
(2.5.4) 


where  6-,      is  the  Kronecker  delta  which  is  1  if  i  =  k  and  is 
zero  otherwise. 

Using  the  iterative  formulas  for  expectation  and 
variance  we  have  from  (2.5.1)  -  (2.5.4) 

K^i'?(s))  =  (1  -  -fi)yi(s)  +  -fiE[K"iiy(s)'  ^'  ^)|y(s)_ 
=  (1  -  ^i)yi(s)  +  -fiE(sT(u)^i?(s)) 

=  (1  -  ^i)yi(s)  +  -fi4(u)K^I?(s))  (2.5.5) 
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=  f 


^(s)| 


(2.5.6) 


Note  that  from  (i)  and  (ii),  we  have 

(iv)  given  b  and  6 ,    Y  (    \    "^    N(Ab,  V)  where 

^  =  (?l(s)'---'  ^m(s))  ' 
V  =  Diag^cTj^  ,...,  ffmj; 


(2.5.7) 
(2.5.8) 


and  from  (iii)  we  can  write 

(v)   given  ^,  B  ~  uniform(R  ) . 
From  (iv)  and  (v)  we  have  the  joint  pdf  of  Y,'^^  ^nd  §  given 
6 


(s) 


oc  [.n^(lA.)]exp[-l(y(^)-Ab)Tv-l(y(^)-Aby 


(2.5.9) 


Now  assume  rank(A)  =  p  and  define 


b  =  (A'^Y~lA)"^A'^V"ly ,  . 


=  (i5i^i(s)?T(s)AiJ  (i5/i(s)Si(s)AiJ' 

Q  =  Y"'^  -  Y~"^a(a'^y"1a)~1a'^v~^. 


(2.5.10) 
(2.5.11) 


With  these  definitions,  the  quadratic  form  in  the  exponent 
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of    (2.5.9)     can    be    written    as 


(y(,)-Ab)Ty-i(y(^)-Ab) 

=  (b-b)T(ATY-iA)(b-b)  +  y'[g)Qy(g) 


(2.5.12) 


From  (2.5.9)  and  (2.5.12),  it  follows  that  given  Y ( ^\    and 
6,  B  ~  N(b,  (a'^V^A)"^).   Note  that  in  (2.5.10)  that  b 

depends  on  y-  ^  \  and  6    since  <t  •  depend  on  6. 

^  -^  1  (s)  1 

Again  using  the  iterative  formulas  for  expectation  and 
variance  we  have 


<ei?(s)) 


E[<5ly(,),  0|y(,)] 


Pl?(s)] 

(.£5i(s)?T(s)Ai)  (j/i(s)Si(s)Ai 


(2.5.13) 


and 


^(^l?(s)) 


=  E 


=  E 


=  E 


^v(B|y(^),  ^)|y(^J  +  v[E(B|y(3^,  s) 
"(aV^0"1?(s)]  +  V(b|y(^)) 
(.£^i(s)?T(s)Ai 


(s) 


+  V 


^(s) 


-1 


(m       _'-p     /  2\  (   ^   _  _      /  2 


?(s) 
(2.5.14) 


63 


So  to  evaluate  EfT^ly^^vj  and  ^(7iiy/-g0>  i't  follows 
from  (2.5.5),  (2.5.6),  (2.5.13)  and  (2.5.14)  that  it  is 
enough  to  evaluate  Efa.    |y^  xj  and  the  quantities  that 
appear  on  the  rhs  of  (2.5.13)  and  (2.5.14).   In  order  to 
evaluate  them,  we  need  to  find  the  conditional  distribution 
of  A  given  ? r^y 

From  (iii),  (iv)  and  (v)  ,  the  joint  pdf  of  Y^  >.  ,  B  and 
A  is  given  by 


^(?(s)'  ^'  0 


M-\ 


[.n^(lA.)]exp[-l(y(^)-Ab)Tv-l(y(^)-Ab)]^2-  %^p(-la6), 

(2.5.15) 


Using  (2.5.12)  and  the  fact  that  given  y^  ^  and  6, 

B  ~  N^b,  (A^Y~^A)~^],  integrating  out  b  from  (2.5.15)  that 

the  joint  pdf  of  Y^  n  and  A  is  given  by 


^fe(s)'  0 


oc 


J^(iA.)Jexp[-l{a6  +  y'[3)9y(3j 


-1  iff-l,  „     ,  1 
^^2^   aTv-IaP 


(2.5.16) 


Since  f(My(3))  =   l["^'  S        «  <y(s)'  0'  *h 


(?(s)) 


e  conditional 


distribution  of  A  given  y^  x.  is  determined  by  (2.5.16)  up 
to  the  norming  constant.   Evaluations  of  (2.5.13)  and 
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(2.5.14)  are  accomplished  now  by  using  (2.5.16)  and 
typically  some  numerical  integration  techniques. 


CHAPTER  THREE 

OPTIMALITY  OF  BAYES  PREDICTORS 

FOR  MEANS  IN  A  SPECIAL  CASE 

3  .  1   Introduct  ion 

In  Chapter  Two,  a  hierarchical  Bayes  procedure  was 
introduced  for  prediction  in  mixed  linear  models,  and  in 
Section  2.4  the  results  were  utilized  for  prediction 
purpose  both  in  finite  population  sampling  and  infinite 
population  set  up  in  the  presence  of  auxiliary  information 
There  we  considered  the  general  case  of  unknown  variance 
components  and  derived  the  posterior  distributions  of 
interest  by  assigning  independent  uniform  prior  to  the 
fixed  effects  and  gamma  priors  to  the  inverse  of  the 
variance  components. 

In  this  chapter,  we  will  consider  a  special  case.   We 
assume  that  the  ratios  of  variance  components  are  known. 
We  derive  HB  predictors  for  the  mean  vector  and  prove  some 
optimal  properties  of  this  predictor. 

In  Section  3.2,  we  consider  the  normal  linear  model 
(2.2.2)  of  Section  2.2  with  A,  the  vector  of  ratios  of 
variance  components,  known.   We  assign  a  uniform  prior  to 
the  vector  of  fixed  effects  b  and  an  independent  gamma 
prior  to  the  inverse  of  error  variance.   We  first  find  the 
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posterior  distribution  of  nonsampled  units  given  the 
sampled  units  in  finite  population  sampling  and  from  this 
we  derive  the  HB  predictor  of  7,  the  finite  population  mean 
vector.   Later  in  this  section,  in  infinite  population 
situation,  the  posterior  distribution  of  the  vector  of 
fixed  and  random  effects  and  the  HB  predictors  for  linear 
combinations  of  fixed  and  random  effects  are  determined. 
Our  approach  to  these  problems  can  be  regarded  as 
extensions  of  the  HB  ideas  of  Lindley  and  Smith  (1972)  to 
pred  ict  ion . 

Although  developed  within  a  Bayesian  framework,  our 
results  should  be  of  appeal  also  to  frequent ists .   For  both 
the  problems,  the  BLUP  notion  for  real  valued  parameters 
(see,  for  example,  Henderson,  1963;  Royal  1 ,  1976)  is 
extended  in  Sections  3.3  and  3.4  to  vector  valued 
parameters,  and  it  is  shown  that  the  Bayesian  predictors  of 
Section  3.2  are  indeed  BLUP.   Like  other  related  papers, 
our  BLUP  results  do  not  require  any  normality  assumption. 
With  the  added  assumption  of  normality,  the  BLUPs  indeed 
turn  out  to  be  best  unbiased  predictors  (BUPs)  within  the 
class  of  all  unbiased  predictors.   In  addition,  it  is  shown 
that  these  Bayes  predictors  are  BUPs  even  for  some 
nonnormal  distributions.   In  these  sections,  we  have  also 
shown  that  the  BLUPs  also  "universally"  (or 
"stochastically")  dominate  (cf.  Hwang,  1985)  the  linear 
unbiased  predictors  for  elliptically  symmetric 
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distributions.   In  Sections  3.5  and  3.6  we  have  shown  that 
these  Bayes  predictors  are  best  equi variant  predictors  i^or 
both  the  matrix  loss  (or  standardized  matrix  loss)  and 
quadratic  loss  (or  standardized  quadratic  loss)  under 
suitable  groups  o"f  transformations  for  a  broad  class  of 
elliptically  symmetric  distributions,  including  but  not 
limited  to  the  normal  distribution. 

We  conclude  this  section  by  introducing  a  few 

notations.   For  a  square  matrix  T  (txt) ,  tr(T)  denotes  its 

trace.   For  a  symmetric  nonnegative  definite  (n.n.d.) 

1  11 

matrix  T,  T^  is  a  symmetric  n.n.d.  matrix  such  that  T^T'^  = 

_i 
T,  and  for  a  symmetric  p.d.  matrix  T,  T    is  a  symmetric 


p.d.  matrix  such  that  T   =  fT  1 


3 . 2   The  Hierarchical  Bayes  Predictor  in  a  Special  Case 
We  will  assume  the  normal  linear  model  (2.2.2)  of 
Section  2.2.   We  consider  in  this  section  the  special  case 
when  A,  the  vector  of  the  ratios  of  variance  components,  is 
known,  while  B  and  R  are  independently  distributed  with  B 
~  uniform(R  )  and  R  ~  gammafiaQ,  hsn)  •       Here  we  will 
consider  the  case  of  finite  population  sampling  in  details 
and  briefly  mention  the  corresponding  results  for  the 
infinite  population  set  up.   In  finite  population  sampling, 
since  7  =  AY     +  CY     for  some  suitable  A  and  C,  we  are 
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still  interested  in  finding  the  predictor  of  ^Y  »  Y    ) 

(recall  that  ^(y^^\  Y^^^)  =  AY^^^^  +  QY^^^),  and  for  this  it 

(2) 
suffices  to  find  the  predictive  distribution  of  Y     given 

Y   '^  =  y^   .   Recall  the  notations  K,  M  and  G  given  in 

(2.3.4)  -  (2.3.6).   Since  X    is  known  in  this  case,  we  have 

the  following  Theorem  3.2.1  instead  of  Theorem  2.3.1.   The 

proof  of  Theorem  3.2.1  is  similar  to  that  of  Theorem  2.3.1 

and  is  omitted. 


Theorem  3.2.1.   Assume  that  n^  +  gQ  -  p  >  2 .   Then  under 
the  model  given  in  (A)  and  (B)  in  Section  2.2  with  A  known, 
d  independent  uniform(R  )  prior  for  B  and  gammaf^aQ,  ^Sq) 


an 


-1 


prior  for  R,  the  predictive  distribution  of  Y     given  Y 
=  y     is  multivariate  t-d  i  st  r  i  but  ion  with  d.f.  nrp  +  gQ  -  p  , 
location  parameter  My     and  scale  parameter  fn^  +  gQ  -  pj 

Using  the  properties  of  the  multivariate  t- 
distri but  ion ,  it  is  possible  now  to  obtain  closed  form 


expressions  for  E 


(1)   w(2)\L(l)  _  ,  (1) 


i{Y^'\   Y^'y. 


=  y 


and 


(2)   ^(1)\L(1)    ..(1) 


^(y    ,  Y^   )Y^^^  =  y^^^  •   I"  particular,  the  Bayes 

estimate  of  ^(Y    ,  Y    ]  under  any  quadratic  loss  is  now 
given  by 


^BFV 


(1) 


(y^  ^)  =  (A  +  CM)y 


(1) 


(3.2.1) 
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(1)> 


We  may  note  that  the  predictor  ggpfY    )  given  in  (3.2.1) 
is  the  outcome  of  the  model  given  by  (A)  and  (B)  with  A 
known,  and  the  use  of  uniform(R  )  prior  on  B  and  it  does 
not  depend  on  the  choice  of  the  prior  (proper)  distribution 
of  R.   This  can  be  formally  seen,  assuming  all  the 
expectations  appearing  below  exist,  as  follows: 


e(y(^)|y(^)) 


=  E 


e|e(y(^>|b,  r,  y^^^)|r, 


(1)1 


(1) 


=  E 


eJx^^-^B  +  E. 


;l5Ii(^ 


(1)     v(l) 


^) 


R,  Y 


(1)1 


(1) 


-lv(l)> 


(x^^^  -  r,,Ellx^^^) 


E<B|R,  y 


(1) 


(1) 


-  ?21?11^     "*"  (^     ~  ?2l5ll^    ) 


(l)T^-lv(l)^    v(i)Viy(i) 


X  (X^'^^E^IX^^^)  X-^E-Y 


MY 


(1) 


(3.2.2) 


where  in  the  above  string  of  equalities,  the  second  equality 
follows  from  the  fact  that  conditional  on  B  =  b,  R=r  and 


!l?ll(y    -X    bj,  r   ?22.lj' 


the  fourth  follows  from  the  fact  that  conditional  on  R  =  r 


(1)T  _.   (l)N-l  (1)T  _1^(1) 


and  Y^^^  =  y^'\    B  ~  N  (x  ^^^  '  E^^X  ^^^)   X^^^^E^ly 


■i(x(^)Vi^^'y') 


and  the  last  equality  follows  from  the 
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definition  of  M,  given  in  (2.3.5).   Thus,  the  predictor 
enp,(Y    )  is  robust  against  the  choice  of  priors  for  R. 

There  are  alternate  ways  to  generate  the  same 
predictor  egp(Y^^^)  of  ^(y^'^S  Y^^-^).   Suppose,  for  example, 
one  assumes  only  (A)  and  (B)  with  b  known  (r  may  or  may  not 
be  known).   Then  the  best  predictor  (best  linear  predictor 
without  the  normality  assumption)  of  ^Y    ,  Y    )    ^^    ^^^ 
sense  of  having  the  smallest  mean  squared  error  matrix  is 
given  by 


E» 


(y<^^'),  (3.2.3) 


8l  ■  G   I 


where  9    =    (~).   (We  say  that  E  <  F  for  two  symmetric 
matrices  E  and  F  if  F  -  E  is  n.n.d.)   If  b  is  unknown,  then 
one  replaces  b  by  its  UMVUE  (BLUE  without  the  normality 
assumption)  (x  ^^-^^S^^X  ^^ -^j   X  ^  ^ -^^E^^Y  ^  """^   The  resulting 
predictor  of  ^(y^   ,  Y^  ^)    turns  out  to  be  egpf Y  ^  ^).       In 
this  sense,  egpfY    j  is  also  an  EB  predictor  of 
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Similarly,  in  this  special  case,  one  can  derive  the  HB 
predictor  (for  quadratic  loss)  of  C(b,  v)  in  the  context  of 
infinite  population  set  up.   Denoting  this  HB  predictor 
E(C(b,  y)|Y)  by  egT(Y),  one  can  see  that  all  the  arguments 
leading  to  the  empirical  Bayes  interpretation  of  ggpfY    ) 
work  equally  well  to  show  that  egT(Y)  also  possesses  the 
empirical  Bayes  interpretation.   Harville  (1985,  1988,  in 
press)  recognized  this  for  predicting  scalars. 

In  the  next  four  sections,  we  will  discuss  a  few 
frequentist  properties  of  egp^Y    j  and  egj(Y).    In  Section 
3.3  we  show  epp(Y    )  is  best  unbiased  predictor  and 
consider  its  stochastic  domination  whereas  in  Section  3.4 
we  consider  these  properties  for  eDT(Y).   In  Section  3.5  we 
show  that  eppfY    )  is  best  equivariant  predictor  of 

^(Y    >  Y    )  under  suitable  groups  of  transformations, 
whereas  in  Section  3.6  we  consider  the  best  equivariance 
property  of  eSjCY)  under  the  same  groups  of  transforma- 
tions.  Jeske  and  Harville  (1987)  have  shown  that  the 
scalar  BLUPs  are  best  equivariant  within  the  class  of  all 
1 inear  equivariant  predictors  without  any  distributional 
assumption.   However,  to  our  knowledge,  the  equivariance 
results  for  vector  valued  predictors  have  not  been 
addressed  before  in  this  context  in  their  full  generality. 
For  an  illuminating  discussion  on  equivariant  (or  invariant 
as  some  authors  call  it)  estimation,  one  may  refer  to 
Ferguson  (1967)  and  Lehmann  (1983) . 
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3 . 3    Best;  Unbiased  Predicfcion  and  Stochastic 
Domination  in  Small  Area  Estimation 


In  this  section,  we  assume  the  normal  linear  model  in 
(2.2.2)  with  A  known.   No  prior  distribution  for  B  and  R  is 


assumed,  and  0    =  (b  ,  r)   is  treated  as  an  unknown 


(1)> 


parameter.   First,  we  prove  the  optimal  ity  o"f  ggplY    ) 
within  the  class  of  all  unbiased  predictors  of 
^{Y    »  Y    )•   Next,  we  dispense  with  the  normality 
assumption  of  v  and  e,  and  prove  the  optimal ity  of 
ggpfY    )  within  the  class  of  all  linear  unbiased 
predictors  (LUPs) . 

We  start  with  the  following  definition  of  a  best 
unbiased  predictor  (BUP) . 


D 


(1)^ 


ef inition    3.3.1.       A    predictor   T^Y         )    is    said    to    be    a    BUP 


of    i(Yj'\    Y^^^)    if    4t(y^'^)    -    i{Yj'\    Y^^^j]   =    Q    fo 


r    all    ^ 


an 


d    for    every    predictor    ^(Y         )    of    dX       \    Y         )    satisfying 


=  0  for  all  9, 

v,[»(y('>)  -  «(y(^),  v^^))]  -  v,[t(v('>)  -  «(/^>,  y(^')]  i 


IS 


n.n.d.  for  all  9    provided  the  quantities  are  finite. 
The  following  general  lemma  plays  a  key  role  in 
proving  the  best  unbiasedness  of  the  predictor  §3p(Y    )  of 
^Y    »  Y    )•   The  lemma  concerns  unbiased  prediction  of  a 
general  g(Y)  ("xl)  based  on  Y    >  where  g(Y)  is  not 
necessarily  equal  to  ^Y    »  Y    )  or    linear  in  Y-   We 
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assume  that  each  component  of  g(Y)  has  a  finite  second 
moment.   Denote  by  Ug ,  the  class  of  all  unbiased  predictors 
i(Y         I    o^  g(Y)  with  each  component  of  5\Y         j  having  a 
finite  second  moment.   Also,  let  Uq  denote  the  class  of  all 
real  valued  statistics  [i.e.,  functions  of  Y    j  with  finite 
second  moments  having  zero  expectations  identically  in  6. 


Lemma  3.3.1.   A  predictor  t(Y^^'*)  €    Ug  is  BUP  for  g(Y)  i^" 
and  only  if 


Cov, 


(1> 


t(y^^^)  -g(Y),  m(Y^^^) 


=  0 


(3.3.1) 


for  a.11    9    and  for  every  m  €  U 


0 


Proof  of  Lemma  3.3.1 .   Let  T( 


If  *(y('))  =  (*i(Y^'>),...,  «u(y('))) 


is  another  predictor  in 


Ug; ,  then 


'S 


«(Y^'')  -  S(Y) 

=  Vj[t(y('')  -  g(Y)]  +  V,[f(Y<'))  -  t(y(^)[ 

,(y(i))  -  t(y^'^),  t(y('))  -  g(Y)] 


+  Cov, 


(3.3.2) 


Now  T.(y^'^^)  -  sJy^^-')    €  Uq  for  every  i  =  1,...,  u.   Hen 
using  the  condition  of  the  lemma,  for  1  <  i  <  u 


ce 


74 


Cov£t(y'^^^)  -  g(Y),  T.(y'^^^)  -  '5i(Y^^^)]  =  Q.      (3.3.3) 

From  (3.3.2)  and  (3.3.3)  it  follows  that 

V^[6(y^^^)  -  g(Y) 

=  V,[t(y^^))  -  g(Y)]  +  V,[^(y(^>)  -  t(y(^))],     (3.3.4) 

for  all  £.   Hence  t(y^^'^)  is  BUP  for  g(Y). 

Only  if.   Given  that  t(y  ^  ^)    is  BUP,  we  will  show  that  the 
condition  (3.3.1)  is  true.   First  we  will  show  that  T-(Y    j 
is  BUP  for  gj(Y)  for  every  i  =  1,...,  u.   Let  U-^Y^  '')  be  any 
unbiased  predictor  for  g^(Y).   Then  6*\Y         ),    a  u-component 
column  vector  with  i    component  equal  to  U-fY    j,  belongs 
to  Ug .   Then 


V^[r(Y^^^)  -  g(Y)]  -  V^[t(y<^^^)  -  g(Y) 


is  n . n . d 


So  we  have 


^^[^i(>^^'')  -  SiCY)]  -  v,[t.(y^^>)  -  g.(Y)]  >  0, 


i  =  1  ,...,  u 


and  consequently  T.(Y    )  is  BUP  for  gj(Y).   Now  following 
the  usual  Lehmann-Schef f e  (1950)  technique  (also  Rao, 
1952) ,  for  any  m  €  Uq 


Covj[t,(y('')  -«!«)•  K^^'^)]  =  « 


(3.3.5) 
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for  all  1  <  i  <  u.   Hence,  (3.3.1)  holds,  and  the  proof  of 
the  lemma  is  complete. 

Remark  3.3.1 .   It  follows  from  the  above  lemma  (see 
(3.3.4))  that  if  Ti(Y    )  and  T2(Y    )  ^^^    both  BUPs  of 
g(Y),  then 


=  0  -  0  =  0  by  (3.3.1),  for  all  9. 


1  .  e 


,  pJTi(Y*^^'*)  =  T2(Y^^'^)   =  1  ^or-  all  0 


Remark  3.3.2.   It  is  also  clear  that  the  technique  of  the 
above  lemma  can  be  applied  in  more  general  contexts. 

We  will  use  the  above  lemma  to  prove  the  BUP  property 
of  SopfY  )  in  the  following  theorem.  Recall  from  (3.2.1) 
that  egpfy*^'^'^)  =  (A  +  CM)Y^^\ 

Theorem  3.3.1.   Under  the  normal  linear  model  (2.2.2), 
e5p(Y^^^)  is  the  BUP  of  ^(y*^^\  y'^^''). 
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Proof  of  Theorem  3.3.1.   In  view  of  Lemma  3.3.1,  it 
suffices  to  show  that  for  every  mfv    )  ^  ^0 ' 

Cov,[e5p(Y^^))  -  i{Y^'\    Y^^)),  m(Y(^>)]  =  0  for  all  0, 


that  is  E/ 


=  0  for  all  9.       Since, 


c(mY^^^  -  Y^^^)m(Y^^^) 
under  the  model  (2.2.2),  MY^^-^  -  E^(y  ^^-^  |  Y  ^^ -*)  = 

E^m(Y^"^^)  =  0,  it  suff: 


;s  to  show  that 


(^ 


(1)T  _i  (l)w^(l) 


V- J- Y 
-11^ 


for  all  9.       Since  E, 


Hv^'') 


(v^'') 


=  0 


=  0 


(3.3.6) 


m(^y       )exp  -^'^(y 


(^)  -  X^^>b 


b)  ?IlO 


(^)  -  X^^V 


dy^^^  =  0, 


differentiating  both  sides  of  this  equation  w.r.t.  b,  one 
gets  (see  p.  318  of  Rao,  1973) 


/x(l>^Ejl(y(^)  -  x('>bHy('') 


X  exp 


M^ 


(1)    v(l)u\v^-l^,(l)    v(l) 


X"  'b 


)  ^-Aiy^'  -  ^'"^) 


dy 


(1) 


=  0 


(3.3.7) 


Using  E^  m(Y^  ^) 


(1)^ 


=  0  again,  (3.3.6)  follows  from  (3.3.7). 


The  proof  of  Theorem  3.3.1  is  complete 
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Remark  3.3.3.   Equation  (3.3.6)  can  be  alternatively  proved 
in  the  following  way.   Note  that  since  Y     ~  N(X    b, 
r-^E-^j),  (X^'^'^^E^^Y*^'^'',  Y^'^'^'^KY^^^)  is  complete  sufficient 
for  6.       Hence  X^    ET^Y     must  have  0  covariance  vector 
with  every  zero  estimator  m(Y    J,  i.e., 

Next  we  show  that  the  conclusion  of  Theorem  3.3.1 

continues  to  hold  even  for  certain  nonnormal  distributions. 

T     T  T 
Suppose  that  e*  =  (v  ,  e  )   and  A  =  Diag(p,  *)  .   Assume 

that  given  R  =  r,  e*  ~  N(0,  r~  A) ,  while  the  df  of  R  is  an 


ar 


bitrary  member  of  the  family  IJ    =    {F:   F  is  absolutely 


continuous  with  pdf  f(r)  =  0  for  r  <  0} .   Let  ^*  denote  a 
subfamily  of  ^  such  that  each  component  of  SgpfY    )  a-nd 
^(Y    »  Y    )  has  finite  second  moment  under  the  model 
(2.2.2)  and  the  joint  distribution  of  e*  and  R.   We  now 
prove  the  following  theorem. 

Theorem  3.3.2.   §bf(^^^'*)  ^^  ^^^  °^  ^(Y^^^'  Y^^^)  under  the 
model  (2.2.2),  e* | R  =  r  ~  N(0,  r~^A)  and  R  has  a  df  from 
5*. 


Proof  of  Theorem  3.3.2.   Using  Lemma  3.3.1,  and  following 
the  proof  of  Theorem  3.3.1,  it  suffices  to  show  that 


%,p[(x^''"5liv<'>(v<^')] 


=  0 


(3.3.8) 


for  all  b  6  R*^  and  for  all  F  G  ■?* ,  where 
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(3.3.9) 


<  oo  for  all  b  and  all  F  G  "J*.   Consider 
the  subfamily  K  =  {r  ~  gamma(ic ,  Id):  c  >  0,  d  >  2}  of  ^*. 
Since  (3.3.9)  holds  for  this  subfamily  K,  Ej^  ^  p  m^Y  ^  ^M    =    0 
for  all  b  and  al 1  F  G  K  gives 


.00 
J      exp(-icr)yi 

0 


i(nT,+d)-l 


X  exp[-lr(y(^)  -  X^'\)\l\{y^'^    -    X^'\j_ 


X    m 


(ih.„(i) 


{y'"yy 


dr  =  0 


(3.3.10) 


for  all  b,  c  >  0  and  d  >  2.   Now  using  the  uniqueness 
property  of  Laplace  transforms,  it  follows  from  (3.3.10) 
that 


/ 


i(nT,+d)-l 


exp 


(1)  v(l)u\^.-lC,,(l)  v(l) 


-lr(/^'   -  X^^^b)   Sil(y 


-   X^    ^b 


^) 


(1)\...(1) 


X    mfy         jdy 


=    0 


a .  e 


Lebesgue  for  all  r  >  0  and  all  b,  i.e., 

yexp[-lr(y(^)  -  X^'\)\-A{y'''    -  X^^),); 
X  m(y^^))dy(^>  =  0 


(3.3.11) 


a.e.  Lebesgue  for  all  r  >  0  and  all  b.   Differentiation  of 
both  sides  of  (3.3.11)  with  respect  to  b  and  some 
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simpl if icat ions  using  (3.3.11)  lead  to 
(l)T^_i  (l)x  /  (1) 


X  exp 


-Ir, 


X  dy*^^^  =  0  (3.3.12) 


a.e.  Lebesgue  for  all  r  >  0  and  all  b.   Multiplying  both 

in 
9  T 
sides  of  (3.3.12)  by  r     and  integrating  with  respect  to 

dF(r)  where  F  G  ^* ,  one  gets  (3.3.8). 

Remark  3.3.4.   Since  "J*  does  not  contain  the  degenerate 
distributions  of  R  on  (0,oo),  Theorem  3.3.1  does  not  follow 
from  Theorem  3.3.2. 

Remark  3.3.5.   In  Theorem  3.3.2,  if  we  take  %    for  "J*,  we 
see  that  the  marginal  distribution  of  Y  is  given  by  the 
family  of  distributions  |'3'(-|Nrp,  Xb,  (c/d)E,  d)  :  b  G  R^ , 
c  >  0,  d  >  2}  and  egp(Y'^^'')  is  BUP  for  ^(y*'^^  Y*^^-*)  for 
this  family  where  g'f-|Nrj,,  Xb,  (c/d)5,  dj  is  N^-variate  t- 
distribution  with  location  parameter  Xb,  scale  parameter 
(c/d)E  and  d.f.  d. 

Next  we  will  show  that  the  predictor  ggplY    )  f which 
is  linear  in  Y    ]  is  a  best  linear  unbiased  predictor 
(BLUP)  of  dY^^\    Y    )•   A  predictor  $(y  ^  "'j  is  said  to  be 


(l)^  u„„  .u„  ^ uv(l) 


linear  if  MY    )  has  the  form  HY     for  some  known  uxn 


T 


matrix  H.   If  in  addition  eJ^^Y^^^)  -  §(y^^\  Y         )\   =    Q   ^or 
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all  e,    we  say  that  ^(y^^'')  is  a  LUP  of  i(X         ,    Y    )•   We 
need  the  following  definition. 

Definition  3.3.2.   A  LUP  PY^^-^  of  §(Y    >  Y^^^)  is  said  to 
be  a  BLUP  if  for  every  LUP  HY^^'*  of  ^(Y^^  .  Y    ), 

viHY^''  -  «(y('>,  y(^'))  -  V,(py(^>  -  j(y(^>,  y(2'))  is 

n  .  n .d .  for  al 1  £  . 

We  now  prove  the  BLUP  property  of  ggpfY    )  ^°^ 
predicting  ^(y    »  Y    )•   To  this  end,  we  will  state  a 
lemma  whose  proof  is  similar  to  the  proof  of  Lemma  3.3.1 
and  hence  the  proof  will  be  omitted. 

Lemma  3.3.2.   A  LUP  PY*^"^"^  of  ^(Y^^\  Y^^"^)  is  a  BLUP  if  and 
only  if 

Cov/PY^'^  -  i{x^'\   Y^^^),  ^-^Y^'^)  =  Q     '    (3.3.13) 

for  al  1  £  and  for  every  known  n^xl  vector  m  satisfying 

E^fm^Y^^'')  =  0  for  al  1  ^ . 

The  following  theorem  provides  the  BLUP  property  of 


e|p(Y^'^'^)  ^oi-  predicting  {(y    »  Y    )•   I"  proving  this 
BLUP  property,  we  do  not  need  any  distributional  assumption 
on  e*.   We  only  assume  E^(e*)  =  0  and  V^(e*)  =  r"^A . 

Theorem  3.3.3.   Consider  the  linear  model  (2.2.2)  without 
any  distributional  assumption  on  e* .   Then  SgpfY    ]  i®  the 


BLUP 


of  i{y^'\   Y^^O 
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/    T     Cl)\  T     ('i-) 

Proof    of    Theorem    3.3.3.        If    E^^m '  Y  ^    ^j    =    rn*X^    ^b    =    0    for 

all  b,  m^^X^"^^  =  O'^.   Hence, 

c°4.i,(Y<>))  -  i(,^^\   y(^>),  .-y(^>] 
=  c(!S!5ii  -  S21)'!' 


0 


for  all  9.       The  last  two  equalities  follow  from  the 

T  ( 1 )     T 
definition  of  M  and  from  the  fact  m  X     =  0  .   Applying 

Lemma  3.3.2,  the  result  follows. 


Remark  3.3.6.   As  already  mentioned,  the  normality 
assumption  is  not  needed  for  proving  the  BLUP  property  of 
-Bf(~    )•   Theorem  3.3.3  unifies  and  extends  the  available 
BLUP  results  related  to  the  estimation  of  the  finite 
population  mean  vector  under  different  models  (cf .  Ghosh 
and  Lahiri,  in  press;  Royal  1 ,  1976;  and  others).   As  in 
Remark  3.3.1  one  can  prove  that  the  BLUP  is  unique  with 
probability  one. 

It  follows  as  a  consequence  of  Theorem  3.3.3  that 
egplY  )  has  the  smallest  risk  within  the  class  of  all 
LUPs  of  ^(y    >  Y    )  =  ^  under  the  matrix  loss 


LqCi,  d  =  a  -  oa  -  o^ 
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(3.3.14) 


and  the  model  (2.2.2)  without  any  distributional  assumption 
on  e*.   The  optimal ity  of  egp^Y  ^  ^)  within  LUPs  holds 
a  -fortiori  under  the  quadratic  loss 


Li(^  6)    =   \6    -    {\ 


n 


=  a  -  o^Qa  -  o 


=  tr[QLo(^,  O], 


(3.3.15) 


where  Q  is  a  n.n.d.  matrix.   Such  a  loss  will,  henceforth, 
be  referred  to  as  generalized  Euclidean  error  w.r.t.  Q. 
The  optimal ity  results  carry  over  via  Theorem  3.3.1  and 
Theorem  3.3.2  under  the  added  distributional  assumption 
(which  is  not  necessarily  normality  assumption)  on  e* .   A 
natural  question  to  ask  now  is  whether  the  risk  optimal ity 
of  eppfy    ]  holds  within  the  class  of  all  unbiased 
predictors,  or  at  least  within  the  class  of  LUPs  under 
certain  other  criterion  for  a  broader  family  of 
distributions  of  e* .   To  investigate  this  question,  we  need 
the  notions  of  "universal"  and  "stochastic"  domination,  and 

their  interrelationship  as  given  in  Hwang  (1985).   Let 

i2> 


Rl(^'  i'^    ^)  =   ^9 


4(/^>)  -  |(Y^^>,  ,^%) 


be  the  risk 


function  of  the  predictor  6    for  predicting  ^  under  a  loss 
function  which  is  a  function  of  generalized  Euclidean  error 
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w.r.t.  Q    for  some  function  L.   The  following  definition  is 
adapted  from  Hwang  (1985). 

Definition  3.3.3.   An  estimator  ^i(Y    J  universally 
dominates  S^iY         )  (under  the  generalized  Euclidean  error 
w.r.t.  Q)  if  for  every  9,    and  every  nondecreasing  loss 
function  L,  Rr  (f,  ^;  6-^)     <    Rj^(^,  ^;  ^g)  holds  and  for  a 
particular  loss,  the  risk  functions  are  not  identical. 

Hwang  (1985)  has  shown  that  (see  his  Theorem  2.3)  S-, 
universally  dominates  6^    under  the  generalized  Euclidean 

error  w.r.t.  Q    if  |^i(y^'^)  -  ^(y^^^  Y^^^)!   is 

2 

stochastically  smaller  than  b2(Y^^-^)  -  ^(Y    ,  Y^^^^)   .   We 

say  that  a  random  variable  Z-.  is  stochastically  smaller 
than  Z2  if  P^(Z^  >  x)  <  P^(Z2  >  x)  for  all  x  and  9,  and 
for  some  9,    Z^  and  Z2  have  distinct  distributions. 

The  next  theorem  shows  that  for  a  general  class  of 

elliptically  symmetric  distributions  of  e* ,  egp(Y    j 

universally  dominates  every  LUP  HY     of  ^Y    ,  Y    j  under 
every  generalized  Euclidean  error  w.r.t.  a  n.n.d.  Q. 
Assume  that  e*  has  an  elliptically  symmetric  pdf  given  by 

h*(e*|A,  r)  «  Ir'^Ap^f  (re*'^A~le*) ,  (3.3.16) 

rp        rp    I 

where  as  already  defined  e*  =  (v  ,  e  )   and  A  =  Diag(D,  $) 
and  the  known  nonnegative  function  f  is  such  that 
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/ 


IjVil  ^ 


i  =  l' 


+  1 


f(re*'^A"^e*)de*  <  oo         (3.3.17) 


where  v  =  (v-j^,...,  Vq)  ^  and  e  =  (e^,...,  ej^  )   .   We  will 
denote  this  distribution  by  S^(0,  r~  A)  where  g_p(/i,  Q*(t    ) 
denotes  the  distribution  whose  pdf  is  given  by 

k(t|/i,  Q*,  (t)  a  |<T2n*|"2f((t  -  ^)Tfi*-l(t  -  /i)/'r2) 

(3.3.18) 

where  t  and  /i  are  in  R  ,  Q*(pxp)  is  p.d.  and  tr  >  0 . 

Note  that  the  normality  of  e*  with  mean  0  and 
var iance-covar iance  matrix  r~  A  is  sufficient  but  not 
necessary  for  (3.3.16)  and  (3.3.17)  to  hold.   It  follows 
from  (3.3.17)  that  E(e*)  exists  and  from  (3.3.16)  E(e*)  = 
0.  ■ 


,-1, 


Note  that  e    —    (^      4)   §   has  a  spherically  symmetric 
distribution  §^(0,  I  ^)  with  characteristic  function  (c.f .) 


E 


|exp(iu  e**)J  =  c(u  u)  for  some  function  c  (see  Kelker, 


1970)  where  i  =  -i-l ,    u  =  (uj,...,  u  *)   and  q*  =  Nrj,  +  q, 
Hence  e*  has  c.f.  given  by 


E[exp(iu'^e*)]  =  c(r~lu'^Au) 


(3.3.19) 


Now  write  \J*^    =    Z^-^V  +  e*^'^'*  ( j  =  1  ,  2)  .   Then  W' 
(WT  ,  Wq  1   has  a  c.f.  given  by 


E[exp(iu*'^W*)]  =  c(r-lu*'^Eu*) 


(3.3.20) 
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where  S  =  *  +  ZDZ^ .   Comparing  (3.3.16)  and  (3.3.19)  one 
can  see  from  (3.3.20)  that  W*  has  also  an  el  1 ipt ical ly 
symmetric  distribution  6^(0,  r~  E)  with  pdf  given  by 

h(w*|5,  r)  oc  |r~ls|~^f(rw*'^E~^w*).  (3.3.21) 

Theorem  3.3.4.   Under  the  model  (2.2.2),  (3.3.16)  and 
(3.3.17),  eSpfY    1  universally  dominates  every  LUP 


6 


(y(1))  =  HY^')  of  i{Y^^\    Y^^^)  for  every  p.d.  Q. 

Remark  3.3.7.   Theorem  3.3.4  does  not  contain  Theorem  3.3.3 
since  Theorem  3.3.4  requires  the  elliptical  symmetry  of  the 
distribution  of  W* ,  while  the  other  does  not.   It  should  be 
noted  though  that  the  model  assumption  made  in  (3.3.16)  is 
not  necessarily  stronger  than  the  usual  assumption  of 
finiteness  of  certain  moments.   This  is  because  the 
assumptions  of  Theorem  3.3.4  hold  even  if  a  distribution 
has  infinite  second  moment  (e.g. ,  for  certain  multivariate 
t) ,  but  the  BLUP  property  is  meaningless  in  such  instance. 

Now  we  will  state  and  prove  a  lemma.   The  proof  of 
Theorem  3.3.4  rests  crucially  on  this  lemma. 

Lemma  3.3.3.   If  W(NrpXl)  has  pdf  hfw  |  Ixj  ,  r],  then  for 

every  L(uxN^),  u  <  N^  LW  =  (Ll'^)^Wu,  where  Wu  =  (lu.  Q)W 
where  0(u  x  (Nrp  -  u)  j  is  a  null  matrix  and  =  means  equal  in 
distribution . 
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Proof  of  Lemma  3.3.3.   The  proof  follows  the  arguments  of 
Hwang  (1985).   From  (3.3.19)  it  follows  that  W  has  c.f. 
E[exp (  i t'^W )]  =  c(r"^t'^t).   Hence 


Efexpfit^'Lw)!  =  c(r"lt'[LL'^t^) 


(3.3.22) 


where  t-.  is  a  uxl  vector.   Next  using  (3.3.22), 


expl  it'[(LL'^)^Wu 


=  E 


expf  it]'(LL'^)^(Iu   Q)W 


=  c[r-ltT(LL'^)'(Iu   Q)(Iu  0)^(LL^)hi 


:(r-ltT(LL'^)ti), 


(3.3.23) 


so  that  the  lemma  follows  from  (3.3.22)  and  (3.3.23). 
It  follows  as  a  consequence  of  Lemma  3.3.3  that 


w'^l'^LW  =  (LW)'^(LW) 


^  f  (llT)^wJ  ((ll'^)Ku 


—     n  U  t'L'  !;  U 


(3.3.24) 


We  shall  use  (3.3.24)  repeatedly  for  proving  Theorem  3.3.4, 

Proof  of  Theorem  3.3.4.   Let  HY^^-^  be  a  LUP  of 

^(Y    ,  Y    )•   Then,  under  the  linear  model  (2.2.2)  and 

from  the  fact  that  E(e*)  =0,  it  is  easy  to  see  that 
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(H  -  A)X*^^'^  =  Cx'^^'' 


(3.3.25) 


1 


Writing  W  =  ?  ''W*  and  using  (3.3.24)  and  (3.3.25)  one  gets 


^T    ^ 


[hy^^)  -  e(y<'\  y('>)]  o[hy(')  -  i{x^'\  v^^))] 

=   W*'^[H-A      -   C]'''fi[H-A      -   g]W* 


d      ,.7To2 


Tr,2, 


-   Win2[H-A       -    C]E[H-A      -    Cj^fi^Wu 


(3.3.26) 


Simi larly , 


[tSp(Y''')  -iy^'\  y<^))]\[,6p(y(^))  -  j(y(>),  y(^)); 


=  w*    [CM      -   C]^n[CM      -   C]W' 


,T^2 


=  w'e2[cm      -  C]*n[CM     -  C]E^W 


1  1 

=  ^^^^[CM   -  C]K[CM   -  C]'^n^Wu' 


(3.3.27) 


Write  r  =  H  -  A  -  gM.   Then, 


rhs  of  (3.3.26)  = 


rhs  of  (3.3.27)  +  wJo^rE^  ^C'^O^Wu , 


(3.3.28) 


since  using  the  definition  of  M  given  in  (2.3.5), 


(CM     -  c)e{^)  =  c(m  -  i)(ijj)r 
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-11  ^T-T 


=  C(MEi-L  -  ?2l)C'^ 

(3.3.29) 

and  using  (3.3.25), 

rX^^^  =  (H  -  A -CM)X^'^'^  =  C^X^^-^  -  MX^'^-^)  =  0.      (3.3.30) 

Theorem  3.3.4  follows  now  from  (3.3.26)  -  (3.3.28). 
Also,  since  5i i  is  positive  definite,  it  follows  from 
(3.3.28)  that  rhs  of  (3.3.26)  =  rhs  of  (3.3.27)  if  and  only 
if  r  =  0,  that  is  H  =  A  +  CM  (cf.  Hwang,  1985). 


3 .4    Best  Unbiased  Prediction  and  Stochastic 
Domination  in  Infinite  Population 

In  this  section  we  will  briefly  consider  a  few  optimal 
properties  of  enT(Y)  which  are  similar  to  those  of 
egplY    1  following  closely  Section  3.3.   First,  we  note 
that  epT(Y)  is  optimal  within  the  class  of  all  unbiased 
predictors  of  C(b»  y)  under  the  normal  linear  model  (2.2.1) 

with  A  known.   As  in  the  finite  population  case,  no  prior 

T     T  . 
distribution  on  B  and  R  is  assigned,  and  £  =  (b  ,  r)   is 


treated  as  an  unknown  parameter.   Next,  dispensing  with  the 

BI' 


distributional  assumption  of  v  and  e,  we  note  that  epj(Y) 


is  BLUP  for  C(b,  v) . 

We  start  with  the  following  definition 
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De'finition  3.4.1.   A  predictor  ^(Y)  o'f  C(b,  v)  is  said  to 
be  an  unbiased  predictor  if  Eor6(Y)  -  C(b»  Y)~|  =  Q  for  all 
9.       An  unbiased  predictor  y(Y)  of  C(b>  y)  i®  said  to  be  a 
BUP  if  for  every  unbiased  predictor  i$(Y)  of  C(b,  v)  , 
V^p(Y)  -  C(b,  y)]  -  V^[y(Y)  -  C(b,  v)]  is  n.n.d.  for  al  1  ^ , 
provided  the  quantities  exist  finitely. 

Recall  that  W  =  (")•   The  following  lemma  is  analogous 
to  Lemma  3.3.1,  and  concerns  the  characterization  of  a  BUP 
of  g(W)  based  on  Y  for  some  known  function  g  where  each 
component  of  g  has  a  finite  second  moment. 

Lemma  3.4.1.   An  unbiased  predictor  y(Y)  of  g(W)  with 
E^[y'^(Y)y(Y)]  <  CO  is  BUP  for  g(W)  if  and  only  if 
Cov^[y(Y)  -  g(W),  m(Y)]  =  Q  for  all  9    and  for  every 
statistic  m(Y)  such  that  E^^m(Y))  =  0  and  E^[m'^(Y)]  <  oo  for 
all  ^. 

Lemma  3.4.1  can  be  proved  similarly  as  Lemma  3.3.1  and 
proof  is  omitted.   We  will  use  this  lemma  to  sketch  a  proof 
of  the  following  theorem  which  concerns  best  unbiased 
prediction  of  C(b,  v) . 

Theorem  3.4.1.   Under  the  normal  linear  model  (2.2.1), 
egj(Y)  is  the  BUP  of  C(b,  v) . 

Proof  of  Theorem  3.4.1.   In  view  of  Lemma  3.4.1,  and  the 
fact  that  Efl[m(Y)]  =  0  for  al  1  ^ ,  it  suffices  to  show  that 


E|[{sBi(>^)  -  i(^^    Y)}m(Y)]  =  0    for  all  0 


Note,  however  that  with  P^-probabi 1 ity  1 
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(3.4.1) 


E^[egj(Y)  -  C(b,  y)|y] 


=  eBl(Y)  -  Sb  -  TDZ'^E~^(Y  -  Xb) 


s(x'^E~lx)    -  TDZ'^S"^x(x'^E"^x) 


x'^E~^(Y 


Xb)  . 
(3.4.2) 


From  (3.4.1)  and  (3.4.2)  it  suffices  to  show  that 


E^r(x'^E'"^Y)ni(Y)1  =  Q  for  all  9 


This  is  proved  similar  to  (3.3.6) 


Remark  3.4.1.   The  conclusion  of  the  above  theorem  holds 
even  for  certain  nonnormal  distributions.   As  in  Theorem 
3.3.2,  one  can  show  that  egy(Y)  is  BUP  of  C(b>  y)  under  the 
model  (2.2.1)  where  e* | R  =  r  ~  N(0,  r~^A)  and  R  has  a  df 
from  'J  ,  where  e*,  A  and  "?*  are  the  same  as  in  Theorem 
3.3.2. 

Next,  note  that  the  predictor  eDj(Y)  is  linear  in  Y 
and  it  can  be  proved  as  in  Theorem  3.3.3  that  it  is  BLUP  of 
C(b,  y)  under  linear  model  (2.2.1)  without  any 
distributional  assumption  on  e* .   But  we  need  to  assume  e* 
has  mean  vector  0  and  finite  p.d.  var iance-covariance 
mat  r  i  X  . 
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Now  we  will  show  that  eQj(Y)  dominates  universally  all 
LUP  oi^  C(b,  y)  i^or  an  elliptically  symmetric  distribution 
of  e*.   Consider  the  generalized  Euclidean  error  loss 
w.r.t.  a  uxu  p,d.  mat r i x  Q 


Li(C,  O    =  \i  -   C| 


Q 


a  -  o^oa  -  c) 


(3.4.3) 


Let  Rl(£,  C;  D  =  E 


l(\6(Y)    -    C(b,  v)|^j 


be  the  risk 


"function  of  the  predictor  6    for  predicting  C  under  a  loss 
function  which  is  a  function  of  generalized  Euclidean  error 
w.r.t.  Q    for  some  function  L.   The  following  definition  is 
similar  to  Definition  3.3.3. 

Definition  3.4.2.   An  estimator  5-.(Y)  universally  dominates 
another  estimator  i^qCY)  (under  the  generalized  Euclidean 
error  w.r.t.  Q)  if  for  every  0    and  every  nondecreasing 
function  L,  Rj^(£,  ^ ;    6^)     <    R^(6,    (;    iSq)  holds  and  for  a 
particular  loss,  the  risk  functions  are  not  identical. 

Now  we  will  state  the  following  theorem  on  stochastic 
domination  of  egj(Y).   Its  proof  will  be  omitted  because  of 
its  similarity  to  Theorem  3.3.4. 


Theorem  3.4.2.   Under  the  model  (2.2.1)  and  an  elliptically 
symmetric  distribution  of  e*  similar  to  the  ones  given  in 
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(3.3.16)  and  (3.3.17),  egj(Y)  universally  dominates  every 
LUP  KY)  =  HY  of  C(b,  y)  for  every  p.d.  Q. 

Remark  3.4.2.   The  version  of  Remark  3.3.7  in  the  present 
context  is  that  neither  the  BLUP  property  of  eQj(Y)  nor  the 
stochastic  domination  of  egj(Y)  as  given  by  Theorem  3.4.2 
imply  each  other.   The  latter  requires  elliptic  symmetry  of 
distributions,  while  the  former  does  not.   On  the  other 
hand,  the  assumptions  on  e*  needed  are  similar  to  the  ones 
given  by  (3.3.16)  and  (3.3.17),  and  do  not  necessarily 
imply  f initeness  of  the  second  moments  of  the  components  of 
e*. 

3 . 5   Best  Equivariant  Prediction  in 
Small  Area  Estimation 

This  section  is  devoted  to  equivariant  prediction  of 
^(Y    »  Y    )  o"  the  basis  of  Y^    under  suitable  groups  of 
transformations.   First  we  will  assume  the  normal  linear 
model  given  in  (2.2.1).   To  motivate  the  first  group  of 
transformations  as  well  as  equivariant  prediction,  first 
consider  an  estimation  problem.   Under  the  normal  linear 
model,  Y  ~  N(Xb,  r~  S)  where  we  may  recall  that  E  = 
*  +  ZDZ   is  known.   Before  proceeding  further  we  will 
define  a^    -    r~^  and  redefine  ^  =  (o-,  b^)  .   We  are 
interested  in  equivariant  estimation  of  AX    b  +  CX    b 
where  the  matrices  A,  C,  X     and  X     are  as  introduced 
earlier.   It  suffices  to  estimate  the  vector  b  where  Xb  can 
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be  viewed  as  a  location  vector.   Note  that  Y  +  X^  ~ 
N(x(b  +  /3)  ,    <t-^e]  and  it  has  new  location  vector  X(b  +  0)  . 
We  are  interested  in  developing  an  estimator  of  b  which 
remains  equivariant  under  a  new  origin  o"f  measurement.   So 
a  "natural"  group  of  transformations  will  be 

h  =  {s^'  ^  ^  ^^=  s^(y)  =  y  +  x^}.  (3.5.1) 

Now  assume  that  we  partition  y,  X  as  in  (2.2.2)  and  only 
y^  ^    is  observed  (which  is  the  case  in  small  area 
estimation).   If  ^(y    j  estimates  (aX     +  CX    jb , 

then  ^(y'^^^)    +  AX^^-*/?  +  CX^^^^  should  estimate 

(aX^^-*  +  CX^^^)(b  +  ^)  =  (A   C)X(b  +  p).       Treating  X(b  +  0) 

as  the  new  location  parameter  one  can  expect  that 

^[y^  -^  -I-  X  "^  ^p\    (estimator  based  on  the  "observed  part"  of 

y  +  Xy3)  will  estimate  (aX  ^  ^  +  CX  ^  ^)(b  +    0)  .       So  we  should 

have 

<y^'^  +  X^')^)  =  6(y('))  +  AX^'^^  +  CX^^)^        (3.5.2) 

for  all  y    ,  0.       Now  if  we  are  interested  in  estimating 

(_{y^'\    v(2)),  instead  of  eJ((y'-'\    Y^^'))  =  Ax'^'b  +  CX^^^b, 
we  can  still  use  6(y    J  and  again  we  will  impose  (3.5.2)  on 

i- 

Note  that  the  induced  group  of  transformations  on  the 

parameter  space 
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e   =    {0:    0    =    (<T,  b'^)'^,  b  €  R"^,  <T  >  O}  (3.5.3) 

is  given  by 

h    =   {i/?'  i    ^    ^^'       i^(^)  =  ('"'  ^^  -^  ^^^T-       (3.5.4) 
A  loss  function  L(^,  0;    6)     is  said  to  be  invariant  if 

=  L(«(y<>',  yf^'),  l:    iy^'^))  (3.5.5) 

for  all  y    ,  y    ,  0    and  0.       An  estimator  6  satisfying 
(3.5.2)  is  said  to  be  equivariant. 

We  will  now  be  interested  in  the  best  equivariant 
prediction  of  ^(y  ^   ,  Y^   V   The  following  lemma  provides  a 
useful  characterization  of  the  class  of  equivariant 
predictors  of  ^(y^   ,  Y^  \.        Its  proof  is  standard  and  is 
omitted . 

Lemma  3.5.1.   Let  ^o(~    )  ^^    ^"  equivariant  predictor  for 
^(y    ,  Y    ).   Then  a  necessary  and  sufficient  condition 
for  a  predictor  6(y         ]  of  ^(y^   ,  Y^   J  to  be  equivariant  is 
that 
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for  al 1  y    ,  where 


h(y(')  +  X^')^)  =  h(y^'))  (3.5.7) 

for  all  y^"^^  and  all  P . 

A  function  h  satisfying  (3.5.7)  is  said  to  be 
invariant  and  hence  must  be  a  function  of  a  maximal 
invariant  function  (see,  for  example,  p.  285  of  Lehmann , 
1986)  under  the  group  of  transformations 

induced  by  Q^  in  the  y    -space.   The  next  lemma  finds  a 
maximal  invariant  function  under  the  group  of 
transformations  Q^^ . 

Lemma  3.5.2.    A  maximal  invariant  function  under  the  group 
of  transformations  ^'-^    is  Ky     where  K  is  defined  in  (2.3.4) 

Proof  of  Lemma  3.5.2.   First  we  show  that  Ky     is  an 
invariant  function.   From  the  definition  of  K,  we  have 
KX^^-*  =  0  and  hence  K(y*^^^  +  X^^^^)  =  Ky*^^^  for  all  y*^^-*  and 
/?.   Now  we  will  prove  the  maximal  ity.   For  y^   ,  y2    such 
that  Ky|    =  Ky^    we  have  y]^  ^  -  y2    is  orthogonal  to  the 
row  space  of  K.   Also  rank(K)  =  n^p  -  p  and  rank^X    j  =  p. 
Again  using  KX     =  0  it  follows  that  the  column  space  of 
X     forms  an  orthocomplement  of  the  row  space  of  K. 
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Hence,  y^"^  -  y^^^*  =  X^^^^  for  some  0,     i.e.,  y^"^  = 
y^    +  X    /?  proving  thereby  the  maximal  ity  of  Ky 

Since  eop(Y    )  is  an  equivariant  predictor  of 
^(Y    ■>    Y    )»  using  Lemmas  3.5.1  and  3.5.2  we  can 
characterize  the  class  of  all  equivariant  predictors  of 
uY    »  Y    )•   This  is  given  in  the  next  lemma. 

Lemma  3.5.3.   Under  the  group  of  transformations  ^^    given 
in  (3.5.1),  a  predictor  ^(Y*'^'^)  o^  ^(Y   \  Y    )  is  an 
equivariant  predictor  if  and  only  if  S    has  the 
representat  ion 

for  all  y    ,  where  i     is  an  Prp-variate  u-component  vector 
valued  function. 

Recalling  the  definition  of  an  invariant  loss  function 
L(^,  i'l    i)    given  in  (3.5.5),  we  see  that  both  the  matrix 
loss  Lq  and  the  quadratic  loss  Lh ,  given  in  (3.3.14)  and 
(3.3.15)  respectively,  satisfy  (3.5.5). 

Definition  3.5.1.   An  equivariant  predictor  ^qI^    )  °^ 
^(Y    »  Y    )  is  said  to  be  a  best  equivariant  predictor 
under  the  loss  Lf^  if  for  every  other  equivariant  predictor 
^(y*^^^)  of  ^(y*^^\  Y*-^^)  E£Lo(^,  8)    -    Lo(^,  ^o)]  is  n.n.d. 

Remark  3.5.1.   Note  that  if  ^o(-    )  ^®  ^^^  best  equivariant 
predictor  of  ^Y    »  Y    )  under  the  loss  Lq ,  then  it  is  so 
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under  the  loss  L^  for  every  n.n.d.  Q.       Conversely,  if 
-o(-    )  ^®  ^^^    best  equivariant  predictor  of  ^(y^       »  Y    ) 
under  the  loss  L-.  for  every  n.n.d.  Q,    then  it  is  so  under 
Lrj .   We  state  and  prove  Theorem  3.5.1  which  establishes  the 
optimality  of  egpfY    j  within  the  class  of  all  equivariant 
predictors  ^(y  ^  '^)  of  ^Y         ,    Y    )  under  the  loss  Lq  ,  and  a 
fortiori  under  the  loss  L^    for  every  n.n.d.  Q. 

Theorem  3.5.1.   Under  the  normal  linear  model  given  in 
(2.2.2),  the  group  of  transformations  ^-^    given  in  (3.5.1), 
and  the  loss  Lq  given  in  (3.3.14),  the  best  equivariant 
predictor  of  ^(Y*^^^  Y    )  i®  given  by  egp(Y    )• 

Proof  of  Theorem  3.5.1.   Note  that  from  (3.5.9)  of  Lemma 


(1)   .,(2)> 


3.5.3,  any  equivariant  predictor  6(y^    ^j    of  ^^Y    ,  Y    )  is 

given  by  ^(Y^^'')  =  §Bf(^^^'*)  "*■  ^{^^^^   )•       °"^y  those  6    are  to 
be  considered  for  which  E^rLQ(^,  6)1    is  finite.   Note  that 
^^Ko(^  '  -Bf)  exists  finitely  and  hence 

E,[£'^(kY^'^>(ky('))]  <  2tr|E,[Lo(^  egp)]  +  E,[Lo(^  O]}  • 


Hence  by  Cauchy-Schwarz  inequality  eJ «(kY ^   )«^(kY    ) 
finite.   Under  the  matrix  loss  Lq , 


IS 
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E,[Lo(^  O  -  Lo(^  egp)]  =  E^[£(KY^'y(KY^'^)] 


+  E, 


+  E, 


(3.5.11) 


Now  we  will  show  that  the  last  two  product  terms  are  null 


matrices . 
have 


Using  the  definitions  of  ggplY    )  and  M,  we 


a.  e 


Leb , 


(x<^''^5ilY<'>  -  x(i>'^silx(^>b) 


(3.5.12) 


Nov  Cov,[x(^>^SI1y('>,  Ky(^)]  =  X<')'^Eil£„KT  =  X^^'^ 
=  0.   Since  Y  is  multivariate  normal  and  X     ?11^ 

and  KY^^^  are  linear  functions  of  Y^  \    so  X^    ^ll^     and 
KY     are  independently  distributed.   Now  using  (3.5.12) 


an 


d  independence  of  X  ^  -^  ?llY     ^^^    ^Y    »  we  have 
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X  E, 


-lv(l)Vv(l)Tv-lv(lV^ 


=  g(x^^^  -  V2i?i\x^  >){^^'^'vi\^''^) 


,(x 


(l)T^_i  (1)    ,(1)T^-1^(1) 


X  E^(X^  '    E-^Y 


X^'^hl\^^'\)EL''{KY^'^)) 


=  0 


(3.5.13) 


Now  from  (3.5.11)  and  (3.5.13), 

E^[Lo(^,  6)    -    LqCI,    egp)]  =  E^[£(kY*^^^)«T(ky'^^^)]  >  0, 

(3.5.14) 


with  equality  i"f  and  only  if 


pJ|(ky^^  )  =  0 


1  ,  i  .  e 


,  4*(y(^>)  =  ,5p{y(i)); 


=  1 


The  proof  of  Theorem  3.5.1  is  complete. 

Next,  instead  of  Q^ ,  we  consider  the  group  §2  '^'^ 
transformations  given  by 

§2  =  {s/?,d'  i^^^^    d>0:  g^,d(y)  =  ^y  +  ^^}-     (3.5.15) 

A  predictor  ^(Y*^"^'')  of    ^Y         ,    Y    )  is  said  to  be 
equivariant  under  the  group  Qq  o^  transformations  if 
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6(dy(^)  +  X^'^i)   =   d8{y^^^)   +  (aX^')  +  CX^^))^    (3.5.16) 

for  all  y    ,  /?  and  d  >  0.   Note  that  egpfY    )  ^®  again  an 
equivariant  predictor  of  ^  Y  ^   ,  Y^  '^j  under  the  above 
criterion.   Also,  note  that  if  6     is  equivariant  under  §2 
taking  d  =  1  in  (3.5.16)  it  follows  that  it  is  equivariant 
under  9^ .   So  the  class  of  equivariant  predictors  under  §2 
is  smaller  than  that  under  Q^ .   But  we  will  prove  that 
-Rf(-    )  ^^  best  inside  this  smaller  class  under  a  larger 
family  of  distributions  of  Y  which  includes  the  normal 
distribution  as  a  special  case. 

We  will  apply  the  group  of  transformations  §2  °"  ""^^e 
fajmily  of  elliptically  symmetric  distributions  with  pdf 
given  by 

f^(y)  oc  |«T2Er^f((y  -  Xb)'rE-l(y  -  Xb)/<T2),        (3.5.17) 

where  f  is  a  known  nonnegative  function  satisfying 

1  +  (y  -  Xb)'rE-l(y  -  Xb)] 

X  f((y  -  Xb)^E~l(y  -  Xb)/(72jdy  <  oo .       (3.5.18) 

Note  that  Y  ~  N(Xb,  a  E)  is  a  special  case  satisfying 
(3.5.17)  and  (3.5.18).   It  is  easy  to  see  that  the  above 
group  of  transformations  §2  °"  Nrp-d  imensional  Euclidean 
space  for  Y  induces  a  group  of  transformations 


/t 
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=  {i/?,d'  ^  e  rP,  d  >  0:  Sp^^Ci)    =    (d«T,  db^  +  ^T)TJ 


(3.5.19) 


on  the  parameter  space  9  given  in  (3.5.3) 


As  before,  a  loss  function  L(^,  0;    5)  "for  predicting 
^Y    »  y    ]  by  6(Y    j  is  invariant  under  the  group  of 
trans-format  ions  Qo  i^ 

=  >-(«(y<''.  y''^  i>  i{yJ'^))  (3-5. 20) 

for  all  y^^  ,  y*^^  ,  P,     d  (>0)  and  6. 

We  shall  consider  the  special  losses 


L2(^  ^5  S)    =    Lq(^,  O/*^^  (3.5.21) 


and 


L3(|,  £;  O  =  L^(^,  O/*^^-  (3.5.22) 


(1)   x,(2)N 


Both  these  losses  satisfy  (3.5.20). 

To  find  the  best  equivariant  predictor  of  ^Y    ,  Y    ) 
in  this  set  up,  we  will  present  two  lemmas  which  provide  a 
useful  characterization  of  the  class  of  equivariant 
predictors.   A  proof  of  Lemma  3.5.4  is  omitted. 

Lemma  3.5.4.   Let  i5q(Y    j  be  an  equivariant  predictor  for 
aY    ,  Y    j.   Then  a  necessary  and  sufficient  condition 
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for  a  predictor  ^(y  ^  ■')    of  ^(y  ^   ,  Y^  ^)  to  be  equivariant  is 


that 


^(y^'^)  =  h{y^'^)  +  b(y^'^)  (3.5.23) 


for  all  y    ,  where 


h 


(dy(')  +  x('>^)  =  dh(y^'^)  (3.5.24) 


for  all  y*^^  ,  0    and  d  >  0. 

We  will  follow  the  arguments  of  Lemmas  2  and  3  of 
Datta  and  Ghosh  (1988)  to  find  a  representation  of  the 
function  h  in  (3.5.24).   Define 

t(Ky^'^)  =  Ky^'V(y^'^V^'^)^Ir  ,,.^   .,.   y    (3-5.25) 


Note  that  since  KSj^K  =  K,  so  y ''^ '^'^Ky  ^'^  ^^  =  (!<y    )'^?il(?<y    ) 
and  t  is  indeed  a  function  of  Ky    .   It  can  be  shown  that 
t(Ky    J  is  a  maximal  invariant  under  the  group  of  trans- 
format  ions 

(3.5.26) 

induced  by  §2  ^^    the  y    -space. 

The  following  lemma  characterizes  the  class  of 
functions  hfy    ]  satisfying  (3.5.24).   We  will  use  this 
lemma  to  characterize  the  class  of  equivariant  predictors. 
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Lemma  3.5.5.   A  function  h(y    j  (uxl)  satisfies  (3.5.24) 
if  and  only  if  h  has  the  representation 

b(y<'^)  =  {y^'^Wi{t{^yJ')\  (3.5.27) 

where  s  (uxl)  is  an  arbitrary  function  of  t(Ky    j. 

Proof  of  Lemma  3.5.5.   Assume  h  has  the  representation 
given  by  (3.5.27).   Now,  since  (dy  ^  "^  +  X^^-^py    x 

K(dy^^)  +  X(^)^)  =  d2y^^)\<y^l)  and  since  ^(Ky^'^)  is  a 

ma>:imal  invariant  under  ^iy,     so  t(  K(dy     +  X    p)  )  =  t^Ky    ) 
and  consequently 

Hence  (3.5.24)  is  satisfied. 

Only  if.   Since  h  satisfies  (3.5.24)  for  all  y^  \    j3    and  d 
>  0,  taking  d  =  1,  we  see  that  h  must  satisfy 
h(y^^^  +  X^^^^)  =  b(y^"^^)  ^or  all  y '^  ^  ^  and  0.       This  implies 
that  h  must  be  invariant  under  ^'^ ,  and  hence  must  be  a 
function  of  Ky  *^  \       So  b(y    )  =  s(Ky*^'^'^)  where  s  (uxl)  is 
an  arbitrary  function  satisfying 

i.e.,   s(dKy<^>)  =  ds(Ky'^^^)  (3.5.28) 


104 


for  all  d  >  0.   Now  taking  d  =  (y  ^'^  ^'^Ky  ^^'^)  ^  for 

(1)T   CD 
y     Ky     >  0  we  have  from  (3.5.28) 

=  .[i(^y^'')y'wi. 

CDT   d) 

Now  for  y^    Ky ^    =0,  if  we  take  h  =  0,  then  (3.5.24)  is 
satisfied  and  we  can  represent  h  by  (3.5.27). 

Since  egpfv    j  is  an  equivariant  predictor  of 
^Y    >  Y    j?  it  follows  from  Lemma  3.5.4  and  Lemma  3.5.5 
that  ^^Y    )  is  an  equivariant  predictor  of  ^(y    ,  Y    ) 
under  the  group  §2  o^  transformations  if  and  only  if 

s(/")  =  5BF(y''')  -  iy^'^Wi^{ii^yJ'^))       (3.5.29) 

for  all  y^'^-^. 

Definition  3.5.2.   An  equivariant  predictor  i*o(-    )  °"^ 
i\Y  »  Y    )  ^®  said  to  be  a  best  equivariant  predictor 

under  the  loss  L2  if  for  every  other  equivariant  predictor 
5(y('))  of  ^(y^^>,  Y^^))  E^[L2(^  i;    6)    -    L^(^,    0;    ^q)]  is 
n  .  n  .  d  . 

Remark  3.5.2.   Note  that  if  ^o(-    )  ^®  "^^^  ^^®^  equivariant 
predictor  of  |(y  ^  \    Y^  ^)  under  the  loss  L2 ,  then  it  is  so 
under  the  loss  Lg  for  every  n.n.d.  fi  and  vice  versa. 
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We  now  establish  that  egpfY    j  is  best  equivari 
(1)   ^(2> 


ant 


predictor  of  ^^Y  ^   ,  Y    )  under  the  loss  L2 .   The  following 
theorem  is  proved. 

Theorem  3.5.2.   Under  the  model  given  in  (2.2.2)  and  the 
fainily  of  elliptically  symmetric  distributions  as  given  by 
(3.5.17)  and  (3.5.18),  the  group  of  transformations  Q^ 
given  in  (3.5.15),  and  the  loss  L2  given  in  (3.5.21),  the 
best  equivariant  predictor  of  ^Y    ,  Y^  ^\    is  given  by 


-BF 


(y^^>) 


Proof  of  Theorem  3.5.2.   Note  that  from  (3.5.29)  any 
equivariant  predictor  6(y^    ^)    of  ^^Y '  \    Y^    -^]  is  given  by 

*(Y<^')  =  ,|p(y('))  +  (Y(')\Y(l))^t(Kv(l))).   Only  .hose 
S    are  considered  for  which  eJl2(^,    9;    6)\    is  finite  and 
in  that  case  for  the  corresponding  s,  we  have 


's(t(KY(^))Wt(KY(^))) 


Y(l)\Y(lytrKY(^)^VT/ 


is  finite.   Under  the 


matrix  loss  Lq , 


'^%[L2(^  i-^    O    -    L^a,    9;    egp)] 


=  E, 


(1)T^v(1U.^.v(1)\Vt 


Y^  ^  KY 


.(t(KY(l))Wt(KY(^))) 


+  E, 


+  E, 


{(^BP(Y^^^)  -  i{Y''\    y^^>))(Y(^)\Y(^))^t(K/^)y 


(^y(^)\y(^H 


(1)   x,(2)^ 


)^t(KY(l>))(egp(Y(^))  -  ^(y^^\  Y^^^) 

(3.5.30) 


■  '  i 


•..*r 
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It  is  enough  to  prove  that  the  last  two  product  terms  in 
(3.5.30)  are  null  matrices.   Proceeding  as  in  the  proof  of 
Theorem  3.5.1  we  have 

=  c[my(1'  -  E,(y<^>|Y^'^)].  (3.5.31) 

Now  from  the  property  of  elliptically  symmetric 
distribution,  it  follows  that 

E^(y(^)|Y^^^)  =  X^^^b  +  E2i?I1(y^'^  -  X^'^b)  a.e.  Lebesgue 
and  using  this,  we  have  from  (3.5.31)  that 


Then  , 


the  second  term  of  the  rhs  of  (3.5.30) 


iv(i)Vv(i)'r^-iv(i)\~^ 


=  c(x^^^  -  ?2i?ll^'  0(^'  '  5Ii^'  0 


X  E, 


■(y(i)t^v(i))^(x(i)ViY^'^  -  x('^''?llx^'M 


.(.y^^^)) 


X  sTftf-^'^^ 


(3.5.33) 
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Now  we  will  show  the  expectation  of  the  rhs  of  (3.5.33) 
is  a  null  matrix.   Note  that  since  E^^Y^    KY  ^   )  <  oo  and 
E.(y^'^'^)  =  x'^'^-'b,  by  Cauchy-Schwarz  inequality  (y  ^    KY  ^  ^^ 
X  X     ?iiy     f^^®  finite  expectation.   Now  using  the 
independence  of  ((y  (^'■^KY<> ')^  X^'^hl\y^'^'j   and  s(ky''^) 
which  follows  as  a  special  case  of  Lemma  B.l,  we  have 


(y(i)\y(i))^(x(i)ViY^'^  -  x^'^Vi^^'M 


X  ^^U{kyJ^^)] 


=    E, 


"(Y(i)\v(i))ix(^)Vi(Y^'^  -  x^'M' 


X  E. 


s'^l  t 


(ky 


(1> 


(3.5.34) 


(1)   :. 


Next  using  the  elliptical  symmetry  of  Y    »  it  follows  that 
Now  since  KX     =  Q  and  K  is  n.n.d.,  one  has 

[(y(')  -  x(^V)\(y('>  -  x^'\)] 
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I.e., 


X  x('>^ei1{-(y<i>  -  x(^>b)} 

i   -(y<^'\y(^>)'x(^>Vi(V*'^  -  X'"b).    (3.5.35) 


Now  taking  expectations  on  both  sides  which  exist  finitely 
due  to  the  previous  observation,  one  gets 


'0 


'(y(1)Tky(1))^X^1)V,1(y(^>  -  X^'^b)]  =  0.      (3.5.36) 


Hence  from  (3.5.30),  (3.5.33),  (3.5.34)  and  (3.5.36)  we 
have 


a'^E^L^U,    i;    8)    -    L^({,    6;    egp)] 


=  E, 


(Y('^V')s(t(KY<'>)Wj(KY(^>))" 


which  is  n.n.d.  for  all  9    and  the  two  risk  matrices  are 
equal  if  and  only  if 


e 


s(t(KY^^))]  =  0 


=  1      for  all  ^ 


1  .  e 


,  Plix'-''')   =   tBF(Y^")]  =  1  ^"  -11  '- 
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Remark  3.5.3.   Although  [(y  ^  ^  ^^Y  ^  ^  ))^  X  ^^^'^EJ  Jy  ^^  ^)  is 
sufficient  and  t^KY    ]  is  ancillary,  Basu's  Theorem  (1955) 
can  not  be  applied  since  the  sufficient  statistic  is  not 
complete. 

3.6   Best  Equivariant  Prediction 
in  Infinite  Population 

In  this  section,  we  concentrate  on  the  equivariant 
prediction  of  C(b»  y)  =  §b  +  Ty  o"  the  basis  of  the  data 
vector  Y  under  suitable  groups  of  transformations.   We  will 
first  consider  the  prediction  of  C(b,  y)  under  the  normal 
linear  model  (2.2.1)  and  the  group  of  transformations  Q^ 
given  in  (3.5.1).   Later  on  we  will  use  the  broader  group 
of  transformations  §2  ^"<^  elliptically  symmetric  distri- 
butions for  Y  ^or"  equivariant  prediction  of  C(b,  v)  .   Jeske 
and  Harville  (1987)  and  Harville  (in  press)  considered 
equivariant  prediction  of  a  scalar  linear  combination  of  b 
and  V  under  the  group  of  transformations  Q^ . 

As  before,  we  say  a  predictor  ^(Y)  c>"f  C(b,  y)  is 
equivariant  under  the  group  of  transformations  Q^  if 

^(y  +  X^)  =  S(y)    +    80,    for  all  y  and  0.  (3.6.1) 

A  loss  function  L(C,  0;    6)     is  said  to  be  invariant  if 

L(c(b,  y)  +  S^,  g^(£);  Ky)  +  S^) 

=  L(c(b,  y),  i;    5(y))  (3.6.2) 
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for  all  y,  v,  /?  and  6.       Note  that  the  matrix  loss  Lq(C,  6) 
=  (^  -  C)  (^  -  C)^  a,nd  the  quadratic  loss  L^(<,  6)    = 
(6    -  C)^0(^  -  C)  a-r"e  invariant.   Now  we  have  the  following 
lemma  similar  to  Lemma  3.5.1  which  characterizes  the  class 
of  equivariant  predictors  of  C(b,  v)  based  on  a  specific 
equivariant  predictor  ^q(Y). 

Lemma  3.6.1.   A  necessary  and  sufficient  condition  for  a 
predictor  ^(Y)  of  C(b,  v)  to  be  equivariant  is  that 

Ky)  =  ^o(y)  +  ^^y^  (3.6.3) 

for  all  y,  where  ^q^Y)     is  a  specific  equivariant  predictor 
and  the  vector  valued  function  h    satisfies 

h(y  +  X^)  =  h(y)  (3.6.4) 

for  ally  and  /? . 

The  function  h  is  said  to  be  invariant  and  hence  it 
must  be  a  function  of  a  maximal  invariant  function  under 
g..  .   So  in  the  following  lemma  we  find  a  maximal  invariant 
function  under  the  group  of  transformations  Qj^ .   This  is 
only  a  restatement  of  Lemma  3.5.2.   So  the  proof  is 
omitted . 

Lemma  3.6.2.   A  maximal  invariant  function  under  Q-^    is  Qy 
where  Q  =  ?~^  -  ^'HU^^'HYV^'^    as  defined  in  (2.3.12). 


Ill 

Since  egj(Y  +  X^)  =  egj(Y)  +  S^ ,  so  egj(Y)  is  an 
equivariant  predictor  of  C(b»  y)  =  §^  +  Ty •   This 
observation  and  Lemmas  3.6.1  and  3.6.2  culminate  into  Lemma 
3.6.3  which  provides  a  useful  characterization  of  the  class 
of  equivariant  predictors  under  Qj . 

Lemma  3.6.3.   Under  the  group  of  transformations  Q^,    any 
equivariant  predictor  ^(Y)  can  be  represented  as 

Hy)  =  §Bi(y)  +  ^(9y)  (3.6.5) 

for  all  y,  where  £  is  an  n^-variate  u-component  vector 
valued  function. 

As  in  Definition  3.5.1,  we  say  an  equivariant 
predictor  6q(Y)    of  C(b,  y)  is  best  under  the  loss  Lq  if  for 
every  other  equivariant  predictor  ^(Y)  E^rLQ(C,  6)    -    Lq(C,  ^q^J 
is  n.n.d.   As  in  Remark  3.5.1  it  is  true  that  an 
equivariant  predictor  of  C(b»  Y)  is  best  under  the  loss  Lq 
if  and  only  if  it  is  best  under  the  loss  L-j^  for  every 
n.n.d.  matrix  Q.   Similar  to  Theorem  3.5.1  we  have  the 
following  theorem  under  the  assumption  that  e*  =  i~]    ~  N(Q, 
(T-^A)  where  A  =  Diag(D,  *)  is  known  p.d.  and  a       >  0  is 
unknown . 

Theorem  3.6.1.   Under  the  model  (2.2.1)  with  the 
distributional  assumption  on  e*  given  above,  the  group  of 
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transformations  Q-j^  and  the  matrix  loss  Lq  ,  the  best 
equivariant  predictor  of  C(b»  y)  is  given  by  egj(Y). 

A  proof  of  this  theorem  is  similar  to  that  of  Theorem 
3.5.1.   Evaluating  first  the  conditional  expectation 
EflfCCb,  y)|Y1  =  Sb  +  TE^[yIY]'  which  is  a  linear  function  of 
Y,  one  can  proceed  as  in  Theorem  3.5.1  to  complete  the 
proof . 

To  conclude  this  section,  we  will  consider  the  group 
of  transformations  Qo  ^.nd  a  class  of  elliptically  symmetric 
distributions  to  show  that  egj(Y)  is  the  best  equivariant 
predictor  of  C(b,  y) .   Assume  that  e*  has  an  elliptically 
symmetric  distribution  with  pdf  given  by 

p(e*|tr)  oc  |^2^|-5^^e*'rA-le*/<T2)  (3.6.6) 

where  f  is  some  known  nonnegative  function  satisfying  the 
cond  i  t  i  on 

/[l  +  e*Te*]f(e*TA-le*/^2jje*  <  oo .  (3.6.7) 

where  A  and  a      are  the  same  as  above.   Now  we  have  the 
following  theorem. 

Theorem  3.6.2.   Under  the  model  (2.2.1)  with  the 
distributional  assumptions  on  e*  given  by  (3.6.6)  and 
(3.6.7),  the  group  of  transformations  ^2  ^"*^  ''^^^ 
standardized  matrix  loss  ^2^i'    -'  -)  ~  ^0^^'  -)/*^  '  ^^^  best 
equivariant  predictor  of  C(b,  y)  is  given  by  egj(Y). 


^W^ 
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To  avoid  the  repetition  of  the  arguments  a  detailed 
proof  of  this  theorem  is  omitted.   Proceeding  exactly  as  in 
Theorem  3.5.2  and  Theorem  3.6.1  we  can  prove  this  theorem. 


m 


CHAPTER  FOUR  ^^ 

ASYMPTOTIC  OPTIMALITY  OF  '<: 

HIERARCHICAL  BAYES  PREDICTORS  FOR  MEANS  J  "t 

4  .  1   Introduct  ion 
We  have  introduced  in  the  previous  chapters  several  HB 
predictors  both  in  the  context  of  finite  and  infinite  -  ^O^' 

population  sampling  under  general  linear  models.   There  we 
have  used  uniform  prior  (over  the  appropriate  Euclidean  ,  . 

space)  for  the  fixed  effects  and  inverse  gamma  (possibly 
improper)  for  the  variance  components  to  predict  7  = 

rp 

(Ti»-»  Tm)  »  "the  vector  of  finite  population  means.   A  , 

natural  question  to  ask  is  that  if  indeed  there  is  a  "true" 

or  "elicited"  prior,  whether  the  HB  predictor  of  7  is,  in 

some  sense,  close  to  the  "true"  Bayes  predictor  which  we 

shall  refer  to  as  subjective  Bayes  predictor.   Such  a 

comparison  can  be  made  conveniently  in  terms  of  Bayes  risks 

of  these  predictors  computed  under  the  elicited  prior. 

Following  Robbins  (1955),  we  shall  call  a  predictor  of  7 

"asymptotically  optimal"  (A.O.)  if  the  difference  in  the 

Bayes  risks  of  the  predictor  and  the  subjective  Bayes  ■> 

predictor  converges  to  zero  as  m  ^  00 .   The  A.O.  property      .    ■' 

of  certain  EB  predictors  arising  naturally  in  the  context 


114 


115 


of  finite  population  sampling  was  proved  in  Ghosh  and 
Meeden  (1986),  Ghosh  and  Lahiri  (1987a),  and  Ghosh,  Lahiri 
and  Tiwari  (1989) . 

In  this  chapter,  we  prove  the  A.O.  property  of  several 
HB  predictors  under  average  squared  error  loss.   To  our 
knowledge,  such  an  attempt  is  the  first  of  its  kind.   In 
Section  4.2,  we  start  with  the  normal  linear  model  (2.2.2) 
when  A,  the  vector  of  the  ratios  of  variance  components,  is 
known.   We  specify  our  loss  function,  the  elicited  prior 
distribution  and  derive  the  subjective  Bayes  predictor  of 
f .       In  Subsection  4.2.1,  we  derive  a  general  expression  for 
the  difference  of  Bayes  risks  of  our  subjective  Bayes 
predictor  and  any  predictor  of  7.   In  particular,  we 
consider  the  sample  mean  vector  and  the  HB  predictor  of  7 
which  can  be  derived  from  Section  3.2.   In  Subsection 
4.2.2,  we  consider  the  random  regression  coefficients  model 
of  Dempster  et  al .  (1981)  and  in  Subsection  4.2.3  the 
nested  error  regression  model  of  Battese  et  al .  (1988)  as 
special  cases  of  the  general  model  of  Subsection  4.2.1.   We 
have  shown  that  the  HB  predictor  is  asymptotically  optimal, 
whereas  the  traditional  sample  mean  vector  is  nonoptimal. 

In  Section  4.3,  we  consider  the  Fay-Herriot  model  of 
Section  2.5  with  all  the  sampling  variances  Vj  =  RT   (say) 
known.   We  assume  for  the  sample  sizes  n^  ^i/"i  ^^^  ^^  ^ 
equal.   We  first  show  the  A.O.  property  of  the  HB  predictor 


^^s 


.■*■•* 
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of  0    (mxl)  where  9    is  as  given  in  Section  2.5.   The  result 
is  then  used  to  prove  the  A.O.  property  of  the  HB  predictor 
of  7.   The  HB  predictor  of  0    is  the  shrinkage  estimator 
obtained  by  shrinking  the  maximum  likelihood  estimator  of  6 
towards  a  regression  surface.   The  model  we  will  consider       '  '^ 
in  Section  4.3  is  a  slight  generalization  of  the  ones  given 
in  Morris  (1981,  1983)  and  Ghosh  (1989),  and  includes  also 
as  special  cases  the  ones  considered  in  Strawderman  (1971) 
and  Faith  (1978) . 

The  assumption  of  known  first  stage  variance  component 
of  Section  4.3  is  dispensed  with  in  Section  4.4,  and  a 
prior  distribution  (proper  or  improper)  is  assigned  to  the 
first  stage  variance  component.   This  situation  is  now  a 
special  case  of  the  nested  error  regression  model  where  the 
covariate  vectors  associated  with  each  unit  within  a 
stratum  are  the  same.   In  Stroud  (1987)  (see  also  Ghosh  and 
Lahiri,  in  press),  one  can  find  such  an  example.   Once 
again,  the  A.O.  property  of  HB  predictor  of  6    is  proved. 
The  result  is  then  used  to  prove  the  A.O.  property  of  the 
HB  predictor  of  7. 

In  the  remainder  of  this  section,  we  will  introduce  a 
few  notations.   For  a  sequence  of  axb  matrices 
|F(m)  =  ((f .  .(m)  m  and  for  an  axb  matrix  F^  =  ((^  ?  j)) .  ^e 

say  limF(m)  exists  and  write  limF(m)  =  F   if 
m— ►oo~  m— ►(» 

lim  f .  .(m)  =  f9.  for  i  =  1,...,  a,  j  =  1,...,  b.   For  an  axa 
m— ►oo  1 J  ^  ^      1 J 
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matrix  T,  we  denote  its  largest  (smallest)  eigen  value  by 
chL(T)  (chgd)). 

4.2    Model,  Loss,  Prior  and  Predictors 
We  start  with  the  linear  model  (2.2.2),  namely, 


:;;;;)  ■  &>  •  &  -  eig 


where  all  the  quantities  appearing  in  (4.2.1)  are  the  same 
as  in  (2.2.2).   We  want  to  study  A.O.  property  of 
predictors  of  7,  the  finite  population  mean  vector.   Let 


ere  n.  and  N-  are 


A  =  .|^(nt1iI.)  and  C  =  J,(nI'iS.-„  .)  "h 

the  same  as  before.   Then  7  =  AY  ^  -^  +  CY  ^   .   Similarly, 


T 
as 


write  the  sample  mean  vector  Y  (g\    =  (.^l(s)'''  '*'m(s)j 

Y.  .  =  LY^'^'^  where  Y.,  .  =  nT^^N-.,  i  =  1,...,  m  and 
-  (s)    --  1  (,s;     1  j_^  ij 

L  =  J^(nThIJ. 

Consider  the  quadratic  loss  function 

L(7,  a)  =  m-l(a  -  7)'^9m(a  -  7)  (4.2.2) 

where  7  is  estimated  by  the  vector  a  (mxl)  and  Qm  is  a 

known  mxm  n.n.d.  and  nonnull  matrix. 

In  this  section  we  assume  A  to  be  known,  say  A  =  Aq  . 

We  will  consider  the  elicited  prior  distribution  Tq    under 
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which  Y  ~  N(xbQ,  t^^e)  where  bg  (pxl)  ,  Tq  (>0)  and 
E  =  *  +  ZD(Aq)Z^  are  known. 

Now  the  subjective  Bayes  predictor  of  7  under  the  loss 
(4.2.2)  is  given  by 


(4.2.3) 


where  E.  .  i,  j  =  1,  2  are  partitioned  matrices  of  E. 

For  known  A,  the  HB  predictor  of  7  with  independent 
uniform(R^)  prior  for  B  and  gamma^iaQ ,  ^Sq)    prior  for  R  and 
the  loss  (4.2.2)  follows  from  Section  3.2  and  is  given  by 


(1) 


Sbf(Y    )  =  ('^  +  CM)Y 


(1) 


(4.2.4) 


where  we  can  recall  from  (2.3.5)  that 


M 


=  5..5I1  .  (x^^>  -  ...£llx('))(x(»-^silx(^))-\<^'^5il 


(4.2.5) 


(1> 


The  Bayes  risk  rg  (ttq,    6)    of  a  predictor  ^(y    )  °^  2  =  2^^) 


/hen  the  prior  is  tTq  is  given  by 


rQ^(-0'  ^) 


=  E, 


0 


m 


(<Y^^))-7(Y))  Qm(<Y^'VT(Y) 


(4.2.6) 
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4.2.1   General  Expressions  for  Bayes  Risks  Difference 

We  will  now  derive  the  difference  in  the  Bayes  risks 
of  an  arbitrary  predictor  ^(y  ^  ^"j    of  7  and  the  subjective 
Bayes  predictor  ggg^Y^  ^) .   For  the  quadratic  loss  (4.2.2), 
noting  that  esB(Y^^^)  =  E^^qLtCY)  |  Y  ^^^J ,  standard  Bayesian 
calculations  give 


=    m    ^E^ 


0 


(4.2.1.1) 


(1) 


or  ^(y^  ^)  =  egp(Y^^^),  the  rhs  of  (4.2.1.1) 


is 


m-^E^r 


0 


=  m   tr 


•^s 


(4.2.1 .2) 


Now  from  (4.2.3)  and  (4.2.4),  we  h 


ave 


^BF 


(1> 


(y'^')  -  W^^'O 


-Uvd)     v(i) 


!?o) 
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Using  this,  from  (4.2.1.1)  and  (4.2.1.2)  we  obtain 


=    Tq  m      tr 


(4.2.1.3) 


(1) 


or    ^(y^    ^)    =    Y/gy    we    have    from    (4.2.1.1), 

^q„,K'  ^(s))  -  •~qm(''o'  ?sb) 


=    tr 


m 


•lQm(E.o(Y( 


s)         -SB 


(1). 


(Y^^^))E.,(Y(g)  -  .seCY^^^)) 


+    tr 


r5^m-lQm(L    -    A    -   CE2iEii)5i  i(l    -    A    -    CE2iE^l) 


=    tr 


"^Qm(Lx' 


(1)       .v(l)      ^v(2)\u    uT/,v(l)      .v(l)      ov(2) 


AX^    ^-CX 


>o^o(t^ 


AX 


CX^^^) 


+   tr 


r5VlQ^(L    -    A    -   CE2i?i1)Eii(l    -    A    -    g?21?ll) 

(4.2.1.4) 


We  will  use  (4.2.1.3)  and  (4.2.1.4)  repeatedly  in  the 
following  subsections. 

4.2.2   Random  Regression  Coef -f icients  Model 

We  will  examine  the  behavior  of  the  risk  difference 
given  by  (4.2.1.3)  and  (4.2.1.4)  in  this  special  case.   For 
simplicity,  a  random  regression  coefficients  model  as 
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appeared  in  Prasad  and  Rao  (1990)  will  be  considered.   The 
model  can  be  described  as  follows: 

(i)   Conditional  on  R  =  r ,  B  =  b  and  Bj  =  b-,  i  = 

1,...,  m,  Y.j  ~  N(b.x.j,  r-1),  j  =   1,...,  N.,  i  = 
1,...,  m  independently; 
(ii)  conditional  on  R  =  r  and  B  =  b,  B.  ~  N^b ,  (tAq)"  j, 

i  =  1,...,  m  independently; 
(iii)B  is  uniform  (-oo,  oo)  and  R  ~  saLmmeLl^^Q,     2^o) 
independently . 
We  can  write  (i)  and  (ii)  as 

v..    =    bx.j  +  v.x.j  +  e.j, 

(j  =  1,...,  N-,  i  =  1,...,  m)  where  e^-  and  v^  are  mutually 

independent  with  v.  ~  n(o  ,  (rAp)"^)  and  e-j  ~  n(o  ,  r~^j. 

Here  X  =   col  (x-),  x,  =  fx. .,...,  x .  x,  )  ,  i  =  1,...,  m,  Z  = 
l<i<m  ^  i"^ 

.ex.,  *  =  Im  ,  D(A)  =  A5llm  and  5=1^+  ^5^®l^i^T•   ^'^ 
1  =  1  ^  I  I  1  — J^ 

will  show  that  under  appropriate  conditions  the  risk 
difference  given  by  (4.2.1.3)  goes  to  zero,  whereas  the 
risk  difference  given  by  (4.2.1.4)  does  not  go  to  zero  as  m 
— ►  oo.   To  this  end,  we  will  prove  two  theorems.   The  first 
theorem  will  prove  the  A.O.  property  of  the  HB  predictor 
whereas  the  second  theorem  proves  the  nonopt imal ity  of  the 
sample  mean  vector.   To  prove  Theorem  4.2.2.1  below,  we 
need  the  following  conditions.   Assume 
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rUJ;  ch,  (Qm)  =  r  (say)  <  oo ,  (4.2.2.1) 

and  i^or  two  positive  numbers  p^  and  P2  with  p^  <  P2 

Pi  <  xjj  <  P2  for  j  =  1,...,  N.,  i  =  1,...,  m.    (4.2.2.2) 

Before  getting  into  the  theorems,  we  will  make  a  remark  on 
the  condition  (4.2.2.1)  which  pertains  to  all  the  theorems 
in  this  chapter. 

Remark  4.2.2.1.   Note  that  for  average  squared  error  loss 
Qm  =  Im  and  the  condition  (4.2.2.1)  trivially  holds. 

Theorem  4.2.2.1.   Suppose  under  the  prior  Tq ,  Y  ~ 
Nfxbp),  rQ^Ej  where  Bq  is  a  scalar.   Assume  the  conditions 
(4.2.2.1)  and  (4.2.2.2)  hold.   Then  for  the  random 
regression  coefficients  model  the  HB  predictor  egpfY    j  in 
(4.2.4)  of  7  is  asymptotically  optimal  under  the  loss 
(4.2.2) . 

To  prove  the  theorem,  we  need  the  following  lemma.   We 
will  use  this  lemma  repeatedly  in  this  chapter.   Also,  in 
our  applications  the  matrix  P  of  the  lemma  happens  to  be 
n  .  n  .  d  . 

Lemma  4.2.2.1.   Let  P(pxp)  be  a  symmetric  matrix  and 
y(qxp)  be  an  arbitrary  matrix.   Then 

chg(p)tr(y'^y)  <  tr(py'^y)  <  chL(P)tr(y'^y). 
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Proof  of  Lemma  4.2.2.1.   Use  the  spectral  decomposition 

P         T 

P  =   E  rj.^-Oi    where  rj .    are  the  eigen  values  of  P  and  ^-^    are 

i=l  1-1-1  1 

the  corresponding  orthonormal  eigen  vectors.   Then, 

tr(pyTy)  =  E  '?itr(^i^Ty'^y)  =  X  ''i^^[(y^i)(y^i)T- 

Since  chgCP)  <  ^i  <  ch^CP),  J^^ilj  =  Ip  ^"^  j  Ji^^^ty^  i)  (^^1)^ 
y(.S  ^i^'[)y^   =  tr(yy'^)  =  tr(y'^y),  one  gets 

hs(P)tr(yTy)  =  chs(P)  E  tr[(y^i)(yei)'^] 

<  .E ';i^r[(y^.)(y^.)'r] 

=  tr(pyTy)  <  chL(P).E  trgy^iXu^if] 

=  chL(P)tr(y'^y) . 


=  tr 


1^-'.. 


The  lemma  follows. 

Proof  of  Theorem  4.2.2.1.   In  Lemma  4.2.2.1,  taking  P  =  Qm 
and  U  =  (X<'^^5T}X('')-^(X(2)  -  S^lEllX^'^)^ ,  it  foil 
from  (4.2.1.3)  that 


ows 


'^qJ'^O'  s5f)  -  ^QmK'  Ssb) 


,-l_-l 


<  rQ^m"^chL(Qm)tr 


<(x<^>  -  E^iSlJx^^)) 


(x<'^''5l}x('y\x(2'  -  E2i5llx('ycT_ 


(4.2.2.3) 
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r.         .     .        .  /    (1)T  (^)T\^         (1)         /  \^ 

Partitioning    x^    =    [^  [  ^    ^i  j     '    ^i         =    ('^il'-'    '^inj     ' 


(2) 


-i         ~    ('^in.+l '•••'    ^iN.)     '     ^    ~    1»---?    m ,    we    can    write    Sj^^    = 


^_i     (1)     (1)T 


1  m  ^  (2)    (1)T 


.©^(inj    +    Aq^x^^    ^x>         )   and    L^^    =    ^Q^.e^xV    ^x> 
=   jS>?j/(^0    -^   ji\-?j)-       Then 


Define    w. 

1 


(2) 


.lv(l) 


(X^         -    ?21?ll^'      )   =   l|?LP'    -    -i>^i^i(u)] 


N. 
I  1 

where    x.  ^    ^    =    (N-    -    n.)~  S        x-  •    and 

i(u)  ^     1  1^       j=n.  +  l     iJ 


tr 


c(x(^>  -  E2i?i;x^^))(x(2)  -  52i?llx^'^)V 


=    .2  f?^?(,)(l    -    wi)2 
1=1  ^    ^ 


in       o 

<    E  St.   . , 
-  ii-i   i(u) 


(4.2.2.4) 


(1)T^_1     (1) 


(1)T/,  ,     ,-1     (1)    (l)Tx-\  (1) 


1  =  1'-  - 

=    .S  rE^xfjAo/(  l^xfj    +    Aq))   =    Aq.S  w..  (4.2.2.5) 


From    (4.2.2.3)     -     (4.2.2.5),    we    have 
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'QmK'  5Bf)  -  ^qm(''0'  SSb) 

<  r5lchL(Qm)(m-l_£x2^^^^A5l(_gw.^ 


1 

(4.2.2.6) 


Now,  by  (4.2.2.2),  m~^.E  xf.  s  =  0(1)  and  .S  w^  goes  to 
infinity  as  m  goes  to  infinity.   Then  by  (4.2.2.1),  the  rhs 
of  (4.2.2.6)  goes  to  zero  as  tn  -+  oo  and  hence  from 
(4.2.2.6) 

JIi[-gJ-0'  ^Bf)  -  -Q>0'  ^Sb)]  ^0-  (4.2.2.7) 

B"^  ^Qml'^O'  SBf)  -  ^Qm('^0'  ^sb)  >  0  f or  al  1  m  and  hence 

mU^hmU'  ?Bf)  -  ^gi'^O'  Ssb)]  ^    ^'  (4.2.2.8) 

Combining  (4.2.2.7)  and  (4.2.2.8),  one  gets  the  theorem. 

Now.  we  will  consider  the  sample  mean  vector.   To 
prove  its  asymptotic  nonopt imal i ty ,  we  need  the  following 
conditions  in  addition  to  (4.2.2.2). 

Sup  n.  =  K  (say)  <  oo ,  (4.2.2.9) 

i>l   ^ 

Nj  >  n^  +  1  >  2    for  i  >  1,  (4.2.2.10) 

and 

limchc(Qm)  =  w  >  0.  (4.2.2.11) 

m— 'OO   ■-5  - 

Now  we  state  and  prove  the  following  theorem. 
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Theorem  4.2.2.2.   Suppose  under  the  prior  ir^ ,  Y  ~ 
NlXb^,  rQ  Ej.   Assume  that  the  conditions  given  by 
(4.2.2.2)  and  (4.2.2.9)  -  (4.2.2.11)  are  true.   Then,  for 
the  random  regression  coefficients  model,  the  sample  mean 
vector  is  asymptotically  nonoptimal  predictor  of  7. 

Proof  of  Theorem  4.2.2.2.   Let  rg  Jtq  ,  Y^^^)  -  r^^{^Q,    egg) 

=  dm-   Since  d^  >  0  for  all  m,  it  is  enough  to  prove  that 

1  im  dm  >  0  . 
m— ►00 

From  (4.2.2.4),  using  Lemma  4.2.2.1,  it  follows  that 


dm  >  chg(Qn,)m 


-1 


r5ltr(L  -  A  -  CS2i?iJ)S^^ 


(1)   .v(l)   ^v(2)^ 


X  (L  -  A  -  CE2i?iJ)   +  bg(LX*~   -  AX  ^   -  CX  ^  '') 

X  (lX*^"*-^  -  AX^^^  -  gX^^-^)    .    •  (4.2.2.12) 


Now, 


(1) 


(lX^^^  -  AX 


(1) 


cx 


(2) 


)(t 


LX 


(1)     .v(l) 


=  J/i(^i(u)  -  ^i(s)) 


AX^-^  -  CX(2>) 


(4.2.2.13) 


where  x 


l"i 
i(s)  =  "i  jii^'ij'  '  =  ^'•••'  "■• 


L  -  A  -  CEo^E 


21^11 


m 


i  =  l 


Ali 


r  n 

nT^ln.  -  A;;^(1  -  w.)x..  xx!-"^M 


Then 
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?■■- 


(l    -    A    -    CS2iSi1)§ii(l    -    A    -    C^^^Vllf 

=   iil^i["i'  ^  ^5'(^i  (u)  -  ^i  (s))^  -  V(l  -  -i)^i  (u)] 


and    hence 


1 
tr(l,    -    A    -    C52iS;J)Sii(l    -    A    -    CSsiSlJ) 

=    .£ff[nll  +  A5l(^i(„)-^i(3))^-VCl-"i)^?(u)] 


m      o 
i  =  l 


-1 


m 


i  =  l 


"I'-    ^?(s)(-5i"^jj         ""    V-i(-i(u)    -    -l'^i(s)) 


.E^(xij-5.i(3))y("i.5\'^ij)  +  V-i(^i(u)--T'^i(s^^^^ 


>    .2^^-^5'-i(^i(u)    -    -l^^i(s)) 


From    (4.2.2.12)     -    (4.2.2.14), 


(4.2.2.14) 


H    m      Q 
dm    >    chs(qm)m-l5:  ff 
^    ~  i  =  l 


(^O^o)'^^i(^i(u)    -    ^l^^i(s))^ 


+    ^o(^i(u)    -    ^i(s)) 


o       1    •" 
>    chc.(qm)(K    +    l)-^tn-li: 


•gi^yr 


i  =  l 


('-0'^0)"^^i(^i(u)    -    ^i^^i(s))^ 


+    ^o(^i(u)    -    ^i(s))    _ 


(4.2.2.15) 
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Let 


and 


Then  , 


m 

E 

i  =  l 


5^    =    |i     :     l<i<m    and    x.^^^    <    l(l    +    wt1)x.^^^| 
$2    =    |i     :     l<i<in    and    R .  ^^^    >    l(l    +    wt1)x.(,^)|. 


(^0^0)"'-i(^i(u)    -    "l'^i(s))      +    ^^o(^i(u)-    ^i(s))  _ 


1..    x2o2 


■^^.2 


>    .E     (roAo)''-i^6-i(s)/     2    Extj 


.2,2^2 


^^,2 


^      ,fi,^0^0-t(s)/i^5,^tj 


>    ^0-5'(4Kpi)-'(Ao    +   J^Pi)"'.  E    ^?(s) 


i€Si 


+    b2Ag(4K2p^)-l    E    xf(^^,       by    (4.2.2.2),     (4.2.2.9) 


ies 


>    d    i:  xf .     .     >    dpfm, 


itl     i(-) 


and    (4.2.2.10) 

(4.2.2.16) 


by    (4.2.2.2)    where 


d    =    min(|Aor5l(4Kp2)-l(Ao    +    Kp^)"!,     b2A2(4K2p4)-l^ 


From    (4.2.2.15)    and     (4.2.2.16)    we    have 


dm    >    chg(Qn,)ni~^(K    +    1)    2dp2n, 


and    by    (4.2.2.11) 
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Lim  dm    >    w(K    +    l)"^dp?    >    0  provided    b^.    ^    0. 

m — ►oo  ■■■  ^ 

This  completes  the  prooi^  of  Theorem  4.2.2.2. 

Remark  4.2.2.2.   Note  that  for  average  squared  error  loss 
Qm  =  Im»  and  (4.2.2.1)  and  (4.2.2.11)  are  satisfied.   Then 
from  the  preceding  two  theorems  it  follows  that  while  the 
sample  mean  vector  is  asymptotically  nonoptimal,  the  HB 
predictor  is  asymptotically  optimal. 

4.2.3   Nested  Error  Regression  Model 

In  this  subsection  we  will  examine  the  asymptotic 
behavior  of  the  risk  differences  given  in  (4.2.1.3)  and 
(4.2.1.4)  under  the  nested  error  regression  model.   The 
corresponding  linear  model  is  given  by  the  equation 
(2.2.3).   Here  *  =  I^  ,  Aq  =  Aq  ,  P(Aq)  =  \-(^l^,    Z^^^    = 

.§  In.,  1  =    .®  iN.-n.  ^"^  ?  =  .®  (In.  +  V^N.)-   ^^^°  ^'^"^^ 


=   col     col   (xT.)  and  X^  -*  =   col       col    (xT.). 
l<i<m  l<j<n.  ~iJ^      ~       l<i<m  n.  +  l<j<N.  ^^J 

We  will  prove  here  that  under  suitable  conditions, 
given  below,  the  risk  difference  between  the  HB  and  the 
subjective  Bayes  predictor  goes  to  zero  as  m  — ►  oo  whereas 
the  positive  risk  difference  between  the  sample  mean  vector 
and  the  subjective  Bayes  predictor  remains  bounded  away 
from  zero.   These  two  results  will  be  provided  in  the  form 
of  theorems.   The  first  theorem  in  this  subsection  proves 
the  A.O.  property  of  the  HB  predictor  whereas  the  second 
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theorem  proves  a  negative  result  about  the  sample  mean 
vector  which  is  design  unbiased  for  the  i^inite  population 
mean  vector  under  simple  random  sampling. 

To  prove  Theorem  4.2.3.1,  we  need  to  assume  that 


111  rp 

V]  X  •  /  \X  .  ^  X  is  p.d.   Moreover, 
jt^^-i  (s)-i(s)     ^ 


and 


1  im   ch. 
m— ►oo    L, 


Jl^i(s)^T(s)/'") 


<  oo , 


m     rp 

j5^?i(s)^i(s) 


0(m), 


m   rp 
.^^-i(u)-i(u)     ^  ^ 


(4.2.3.1) 
(4.2.3.2) 


(4.2.3.3) 


where  x 

i  =  1  ,...,  m. 


n.  N. 


J' 


m 


Remark  4.2.3.1.   Note  that  if  for  some  p.d.  matrix  Q, 

li'^J  ^  ?i  Cs")- i  f  sV  "*  I  ~  '^'  *^^"  conditions  (4.2.3.1)  and 
(4.2.3.2)  hold. 

Also  to  prove  Theorem  4.2.3.1  below,  we  need  an 
inequality  involving  the  eigen  values  of  matrices.   To  this 
end,  we  state  and  prove  the  following  lemma. 

Lemma  4.2.3.1.   For  a  symmetric  n.n.d.  matrix  P(pxp)  and  a 
symmetric  matrix  T(pxp) 
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(i)  cHlCT)  <  cHlCP  +  T) 

(ii)  chg(T)  <  chg(P  +  X) • 
Proof  of  Lemma  4.2.3.1.   From  p.  62  of  Rao  (1973),  we  have 


a     la 


a     rP    +    T")  a 
ch,  (T)    =    sup     ~  rf~    <    sup     -     ^~  rf. — =^-^    =    chr  (P    +   T) 

~     §.^0  a*a     a^O      "  '" 


T 
a  a 


Similarly  (ii)  can  be  proved. 

Theorem  4.2.3.1.   Suppose  the  prior  ttq  is  given  by  Y  ~ 
N(XbQ,  Tq  Ej,  and  the  conditions  given  by  (4.2.2.1), 
(4.2.2.9)  and  (4.2.3.1)  -  (4.2.3.3)  are  true.   Then 
gpplY    ),  the  HB  predictor  of  y    under  the  nested  error 
regression  model,  is  asymptotically  optimal  under  the  loss 
(4.2.2). 

Proof  of  Theorem  4.2.3.1.   Let  a^i  =  Vq    i^n^     -Rf)  ~ 

'^0  ("^O'  ~Sr)'   Note  that  afj,  >  0  for  all  m.   To  prove  the 

theorem,  it  is  enough  to  show  1  im  am  =  0. 

°  m— ►oo    '" 

Now,    from    (4.2.1.3),    taking    P    =    Qm    and    U    = 
(x(')'^?-lx^'^)"^(x(^^    -    E2iEilx^'ygT    in    Lemma   4.2.2.1,     it 
f ol lows    that 


(2) 


■lv(l)> 


am    <    r5lm-lchL(Qm)tr[c(x^    ^    -    ?21?11^         ) 

X  (x(')^Eilx(^y'(x(^)  -  52i5llx^'^)V 


(4.2.3.4) 
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Now 


(1)T  _i   (1) 


-1 


taking  P  =  {X^'^'?1\X^'^)        and  yT  =  [x^^^  -  E^^^-\x^' ^ Q^ 


lv(l)\T^T 


in  the  Lemma  4.2.2.1,  we  have  from  (4.2.3,4)  that; 


am  <  r5lm-lchL(Qm)chL(x*^^^^E^lx'^^^) 


tr[c(x^^)  -  E2iEilx(^>)(x^2)  -  E^l^ll^^'^jV 

(4.2.3.5) 


Here  E2iE^l  =  J^[^(n.  +  Aq)  "Ij^  ._^  .  ^  ^  J  ,  E2iEilx^^) 

i<f<m("i^"i  ^  ^0)"^iN.-n.Si(s))  ^"^  X 

col    (xT.)  -  n.(n.  +  Ap.)~^l^J   ^  xT.  . 
ni  +  l<J<Ni  ^J      ^   1     "    ~N.-n.~i(s) 


(2)       ^-lv(l) 
^21^11<> 


col 
l<i<m 


From 


these  we  have 


C(x(^)  -  E2iE-lx(^))(x(2>  -  E^iE^lx^^Yc'I' 
=  .S^i(Si(u)  -  "i("i+^0)''Si(3)f(xi(^)  -  ni(ni+Ao)-lx.(^^) 
-  i?i(-i(u)  -  "i^"i+^0)"^5i(s)r(Si(u)  -  "i("i+^0)"^§i(s)) 


<  2 


m   _rp  m      rp 

.5^^i(u)?i(u)  +  .5^Si(s)§i(s) 


(4.2.3.6) 


since  for  two  vectors  q^    (pxl)  ,  a2  (pxl)  (a^^  -  ^o)  (-1  ~  -2) 
From  (2.4.2)  we  have 

(i)T     1    (1)        m     "i/     -    Y     -    ^T 


J 
m 

E 

'i=l 


+  Ao^Eni(n.  +  ^o)~h^^^^^J^^y 
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From  this  and  (4.2.2.9)  we  get  that  X^  ^  ^ll^ 


^emma 


-1         -  -T 

-    Aq(K    +    Aq)        H  >J  •  /    ^^  •  r    ">    ^^    n.n.d.     and    hence    i^rom    Lt 

4.2.3.1,    taking    P    =    X^    ^     E^JX^    ^    -    AqCK    +    ^q)  "    .?  ^i  (s)^T(s) 
and   T    =    Aq(K    +    Aq)      .E  -::<  i  (g)>^  .  ^^^ ,    we    have 

chs(|Ao(K    +    Ao)-l_Ex.(^)xT(^)j    <    chs(x  ^^  ^'^E'lx  ^^  >) 


I.e. 


ch, 


(x^^^VrX^'^) 


-1 


<    (l+K/Ao)chL 


m 


(4.2.3.7) 


c-,T 


From    (4.2.3.5)    -    (4.2.3.7),    we    have 

am    <    2r5lm-lchL(Qm)(l    +    K/Aq)  ch  J  £  x  .  ^^^x  f  ^^^ 

_.5i-5i(u)^i(u)    +    .5^?i(s)?i(s) 
=   2r5l(l    +   K/Ao)chj^(Q,)chL(^_£x.^^)xT^^^/m^ 

m  ry  /        t~)  XX\  rjl  if 

it^l-i(u)-i(u)/  .t'^-i(s)-i(s)/ 


-1 


(4.2.3.8) 


The  rhs  oi^  (4.2.3.8)  goes  to  zero  as  m  goes  to  infinity,  by 
conditions  (4.2.2.1)  and  (4.2.3.1)  -  (4.2.3.3).  Hence  from 
(4.2.3.8) 


1  i  m  am    <    0 

m— ►oo    '"    - 
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But    am    >    0    for    all    m .       Consequently     1  in  am    =    0    and 
III    _  ^  "^    m— >oo    '" 

the  proo-f  of  Theorem  4.2.3.1  is  complete. 


Remark  4.2.3.2.   Note  that  Theorem  4.2.3.1  remains  true  if 

m     rp  r)  m     rp  c\ 

we    assume     y^xJ/    \X.,    n    =    o(m    )    and     y]x-/    \X.^    >,    =    o(m    )    or 
it^l-i(")-i(u)  ^       ^  iri-Usj-iCs)  ^       ^ 

m  _rp     ^  1+t 

if  we  replace  (4.2.3.2)  by  ^^-r    ^x .  ^  \  =  0(m    )  and 

•^  —  \~  ^  \^)~  ^  \^} 

(4.2.3.3)  by  .£  S]'(u)§i(u)  =  0('"'^'*'^)  ^o>~  ^"V  f  <  1-   In  a 

special  case  of  the  nested  error  regression  model,  to  be 

considered  in  Section  4.4,  x-  •  =  x.,  j  =  I,--,  N.,  i  = 

1,...,  m;  in  this  case  conditions  (4.2.3.2)  and  (4.2.3.3) 

m  rp  m         rp 

are  identical  .   Also  in  this  case  Y^x.^  \X./  x  =  Vx-x-  is 

^^^-i(s)-i(s)    ■{^x~ 

p.d.  since  it  is  assumed  that  ranklX    j  =  p. 

Now  we  will  briefly  consider  the  infinite  population 
set  up  for  this  model.   An  alternative  predictor  for  7  in 
the  nested  error  regression  model,  as  we  have  seen  in 
Section  2.4,  is  the  predictor  of  Xb  +  Y  =  /i  (say)  ,  the 
conditional  mean  vector  given  the  values  of  the  covariates 

and  the  realized  value  of  the  random  vector  v  of  the 

stratum  effect,  where  X  =   col  ( x : ^  ^|  and  x.^  ^  = 

l<i<mV~i(p)/      -i(p) 

N-   J^  X .  • .   From  Section  3.2  one  can  show  that  the  HB 
predictor  /'uR  (®^y)  °^  ^  under  the  loss  (4.2.2)  is  given  by 
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^HB  =  [x  -  ,|rt>i("i  -  ^oyhj,^,)y''h-A^'''f 


X^'^VlY^  ^,-9lJni(ni  +  Ao)-lY.(3)).   (4.2.3.9) 


Now  assume  that  the  prior  Tq  specifies  that  (~    1  is  MVN 
with  E^q(y^^^)  =  X*^^^bo,  E^q(v)  =  0,  V;,q(y^^^)  =  r5lEii, 
V^q(y)  =  (roAo)-ll„  and  Cov,q(y*^^\  v)  =  (rQAo)  "^Z  *^^\   Then 
the  subjective  Bayes  predictor  /igg  (say)  of  /i  under  the 
loss  (4.2.2)  is  given  by 


^SB  =  [^  -  i|?L("i^"i  +  ^0)"^sT(s))]^0 


(4.2.3.10) 


It  can  be  shown  that 


''Q„,('^0'  ^Hb)  -  ^qJ'^O'  8sb) 


=  (mrQ)  ^tr 


9-1^  -  i|?L("i^'o  +  "i)''sT(s))} 


and  that  under  the  same  conditions  of  Theorem  4.2.3.1  this 
risk  difference  tends  to  0  as  m  — ►  oo .   Thus,  A^b  also 
possesses  the  A.O.  property.   Now  in  the  following  theorem 
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we  will  establish  that  the  sample  mean  vector  Y/g)  is 
asymptotically  nonoptimal  under  certain  conditions. 

Theorem  4.2.3.2.   Under  the  prior  Tq  given  by  Y  ~ 

N(xbQ,  r^^?)  and  the  conditions  (4.2.2.9)  -  (4.2.2.11),  for 

the  nested  error  regression  model  the  sajnple  mean  vector 

Y^  >.  is  asymptotically  nonoptimal. 
-  (s; 

Proof  of  Theorem  4.2.3.2.   As  in  Theorem  4.2.2.2,  it  is 
enough  to  show  that  for  d^  =  '~qjjj('ro '  -(s))  ~  ''QmV'^O '  -SB) 

1  im  dm  >  0  . 

m— 'oo 

From  (4.2.1.4)  and  Lemma  4.2.2.1  we  have 

dm  >  r5lm-lchs(qm)tr[(L-A-CE2iEil)?ii(L-A-CE2iSllf]. 

(4.2.3.11) 

In    this    case,    L    -    A    -    C'L^^'£l\    =    Aq.®  fi(Ao    +    n.)"lnTll^.    and 

hence    (L    -    A    -    CE2i?i  i  )?1 1(  i^    -    ■^    ~    2-21-llj       ~ 
AQDiag(ffn^l(AQ    +    n^)"!,...,    -P^nm^  (Xq    +    n^)"^).       Then    from 
(4.2.3.11)    we    have 

dm    >    r5lm-lchs(qn,)Ao£  f?nTl(AQ    +    n-)"! 

i  =  l 

>    r5lAQ(Ao    +    K)-1k-1(K    +    l)~2chs(gm) 

by    (4.2.2.9)    and    (4.2.2.10).       Hence,     by    (4.2.2.11) 

Lim  dm    >    ro^Ao(Ao    +    K)-1k-^(K    +    1)-'^im    >    0 
m— ►oo  \j     yj      \j 

and  the  theorem  follows. 
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4.3    Asymptotic  Optimal ity  with  Known  First  Stage 
Variance  Component:   Fay-Herriot  Model 

Consider  the  following  hierarchical  model. 

I.  Conditional  on  O^  =  0^,    B  =  b  and  A    =    6, 

Yi  =  (Vii,-,  ^i^.y  -   n(^.1n.,  VjI^.), 

i    =    1,...,    m    independently    where    V^     (>0)     is    known 

sajnpling    variance    for    i  stratum; 

T 

II.  conditional  on  B  =  b  =  (b-,^,...,  bp)  ,  and  A    =    6, 

e  ~  N(X»b,  <5Im), 
where  0  =  (o^,...,    6,1,  )^ »  i    =    (^i»--»  %)^  and  X*  = 
(xi>-»  Xm)^>  each  x^  being  a  p-component  column 
vector.   It  is  assumed  that  m>p  and  rank(X*)  =p. 

III.  B  and  A  are  marginally  independent  with  B  ~ 
uniform(R^) ,  and  A  having  a  (proper  or  improper) 
distribution  h(S)    on  (0,  oo)  .   Explicit  form  of 
h(6)  will  be  provided  soon. 

The  model  given  above  by  I  and  II  is  known  as  Fay- 
Herriot  model  and  was  referred  to  in  Section  2.5.   Here  we 
would  like  to  show  that  the  HB  predictor  of  7,  the  finite 
population  mean  vector,  is  asymptotically  optimal  under  the 
loss  (4.2.2).   We  will  accomplish  this  by  showing  that  the 
HB  predictor  of  f  is  asymptotically  optimal.   Assume  that 
we  have  the  sample  mean  vector  Ycg>  =  (  iCsV'  ^m(s)/ 

where  Y./  x  is  based  on  n-  observations,  i  =  1,...,  m. 
i(s)  1 

Assume  n-  are  so  chosen  that  V-/nj  =  V  for  al  1  i  =  1,...,  m. 
In  that  case,  from  I-II  we  have 
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I  .    conditional  on  O    =    i,    B  =  b  and  A    =    6, 

Y(s)  ~  N(^  Vim), 

II  .   conditional  on  B  =  b  and  A  =  ^, 

e  ~  N(X^b,  5Im); 
and  we  replace  III  by 

III'.  B  and  A  are  marginally  independent  with  B  ~ 

uniform(R  ),  and  A  having  a  Type  II  (proper  or 
improper)  beta  pdf  given  by 

hW  oc  6-Hy    +  ^)"^""'^^I[0<*«x>]'      (4.3.1) 

with  a    >    0    and  /?  (real)  .   In  order  to  ensure  a 
proper  posterior  distribution  "for  A,  we  shall 
impose  some  restriction  on  /?  later. 
Some  authors  considered  models  which  are  either 
special  cases  or  generalizations  in  some  sense  of  the  model 
given  by  I'-IIl'  with  Y  replacing  Y^  ,.  .   For  example,  in 
Morris  (1981),  the  case  p  =  1  and  X*  =  Im  was  considered. 
He  used  the  notation  fi    for  b-.  and  obtained  HB  estimators  of 
6    when  a   =    1    and  (3    =    -1  .        In  Morris  (1983),  stages  l'  and 
II  of  the  model  are  considered  in  their  full  generality, 
but  b  is  assumed  known,  and  an  EB  rather  than  an  HB 
procedure  is  used  in  the  sense  that  no  distribution  on  A 
is  assigned,  and  it  is  estimated  on  the  basis  of  the 
marginal  distribution  of  Y.   Ghosh  (1989)  considered  all 
the  three  stages  of  the  model  with  a    =    1    and  /?  =  -1  . 
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Strawderman  (1971)  considered  the  model  with  known  b  and 

a  =  1,  /?>0,  while  Faith  (1978)  considered  the  case  of    known 

b,  but  assigned  a  general  class  of    priors  to  A  including 

but  not  limited  to  the  Type-II  beta  density  as  given  in 

(4.3.1)  with  a  >  1  and  /?  >  0 . 

Throughout  we  assume  without  mention  that  m  is  so 

large  that  A('n  -  p)  +  /?  >  0 .   Algebraic  manipulations  lead 

to  the  following  facts  from  I  -III  : 

(i)   conditional  on  Y/  ^  =  y^    ^    and  A    =    6, 
^    '  -(a)         i(s) 


e  ~  N(Hy^^^,  VH), 


-1 


(4.3.2) 


where  H  =  V~l(v"l  +  (5"^)   (im  +  V*"^?^  ) ,  Px   = 
(ii)  conditional  on  Y/  \  =  Yr    \>  A  has  pdf 


-^(v  +  0-^{y|3)(lm  -  Pxjy( 


X     (V    +    ^)"^^'"-''V-l(V   +    6)-("+^> 


s)| 


,        0<6<oo. 

(4.3.3) 


Let    U   =   V/(V   +    A)    and    u    =   V/(V   +   6).       Then    it    follows   from 
(4.3.2)    that 


e(^|"'  y(s))  =  (1  -  ")y(s)  +  "Px,y(s)- 

Also,  writing  s^    =    y^^Jlm    "  Px*)?(s)A'  ^"^  follows  f 


(4.3.4) 


rom 
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(4.3.3)  that  the  conditional  pdf  of  U  given  Y^^^  =  y^^-^  is 

<x    expl-iusmju  (1  -  u)    ^[0<u<l]*     C4.cS.5; 

Accordingly,  under  the  quadratic  loss  L(£,  a)  with  L  given 
in  (4.2.2),  the  HB  estimator  of  6    is  given  by 

^Bi(Y(s))  =  e(QIY(3)) 

=  (i  -  e.,/^(u|Y(s)))y(s)  +  E,,^(^IY(3))exJ(.), 

(4.3.6) 
where 

Ea,/?(U|Y(3))  =  /  -^a,/j("l^(3))d- 
0 


t      i(m-p)+/?        a-1    /"  1  c  L 
=    j    u^  (1  -  u)"  ^expl  -^uSm  jdu 


and 


0 

"  (4.3.7) 


S.  =  Y|3)(ln.  -  exJ^(s)A' 


In  the  above  E   ,,  denotes  the  expectation  w.r.t.  the  pdf 

^a,i3   gi^^"  ^"  (4.3.5)  . 

We  shall  now  evaluate  the  performance  of  the  HB 
estimator  egj  of  9    for  large  m  under  the  loss  (4.2.2)  and 
the  N(x»bQ,  i^Qlm)  prior  for  0,   The  subjective  Bayes 
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est imator  oi^  9    under  this  prior,  say  ;r^  and  the  loss 
(4.2.2)  is  given  by 


=Sb(^(s))  =  (1  -  "o)Y(s)  +  "0^*feo 


(4.3.8) 


where  u^  =  V/(V  +  6^.) 


0> 


In  this  set  up,  for  an  estimator  S    based  on  Y^  ^  of  0, 

-(s) 

we  denote  its  Bayes  risk  under  the  prior  Tq  and  the  loss  L 
in  (4.2.2)  by 


-q,(-0'  ^) 


E^ 


0 


(4.3.9) 


where  the  expectation  E^  inside  the  square  bracket  is 
w.r.t.  the  conditional  distribution  Y^  %  ~  N(^,  VIji,)  and 
the  outer  expectation  E^   is  w.r.t.  the  prior  distribution 


^0 


As  in  (4.2.1.1),  we  have 


^Q^C^^O'  O    -    ^QmK'  ^sb) 


=  m  ^E, 


'0 


=  m   E 


rp 

^4(-(-(-))  "  ^Sb(V(3)))  9-(<^(,))  -  egB(Y(3))^ 
^(?(s))  -  ^Sb(?(3)))  Q-(<V(,))  -  S^b(Y(s))) 


(4.3.10) 


where  the  expectation  E*  is  w.r.t.  the  marginal 
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distribution  '"^^(y^s))  of   Y^^^  ~  N^X^bg,  (V  +  -Jq)  Im) 
obtained  by  averaging  the  (conditional)  distribution  oi^ 
Y  (■    \    given  in  I  w.r.t.  the  prior  itq.       We  first  show  that 

^Qm(''0'  ^Bl)  -  ^^Qml^O'  ^Sb)  "*  0  as  m  ^oo.   To  achieve  this, 
we  need  to  prove  a  series  of  lemmas.   Denote  the 

pectation  ^a ,  pi^^"^  (s))    S^^^n    in  (4.3.7)  by  Tm(Y^g^;  a,    /?) , 


ex 


that  is 


T„,(Y(3);  a,  /?) 

=  /u^^'"-P^^^l  -  u)«-lexp(-iuS„)du 
0 

-  y\K-P)+/?-l(i  _  u)«-lexp(-luS,)du.     (4.3.11) 
0 

We  subscripted  it  by  m  to  show  its  dependence  on  m. 

Note  that  Tfn(Y^  n;  q,  /?)  is  a  statistic  since  a  and  p 
are  known  prior  parameters  as  appeared  in  III  .  The  first 
lemma  of  this  section  shows  that  for  a    =    1,    TmlY^  s;  1,  i?) 

^  *  Uq  when  0    has  the  prior  ttq  ,     i.e.,  marginal  ly  Y  ^  %  ~ 
N(x*bo,  (V  +  5o)Im). 

Lemma  4.3.1.   Suppose  Y  ( g\    ~  N^X^bQ ,  (V  +  (5o)Im)-   Then 
TmfY/gN;  1?  P)    converges  a.s.  (as  m  — >  oo)  to  Uq  . 

Proof  of  Lemma  4.3.1.   For  0=1,  from  (4.3.11)  we  have 
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Tm(Y(«);     U    0) 


1    1 


-h 

0 


(tn-p)+/? 


.1    1 


l(m-p)+/3-l 


p(-luSn,jdu  /  /    u^  exp^-iuSmjdu 

(4.3.12) 


First,  integrating  by  parts  the  numerator  in  (4.3.12),  it 
follows  from  (4.3.12)  that 

Tm(Y(3);  1,  /?)  =  (m  -  p  +  2I3}/Sm    -    ( JSmexp(lSm)  j 

•^  i(m-p)+;S-l 


1^ 

0 


sxpf-iuSmjdu 


(4.3.13) 


Assume  first  that  m  -  p  is  an  even  integer  and  0    is  an 
integer.   Using  the  symbol  (d)p  =  d(d  -  1)  ...  (d  -  r  +  1) 
for  d  >  r,  successive  integration  by  parts  leads  to 


0 


^  l(m-p)+/3-l    /  1 


exp|  -^uSm  jdu 


'    N-(l(m-p)+/?)/  \ 

^S,T,   ^         ^  i(m  -  p)  +  /?  -  1  ! 


i(m-p)+/?-l/i 
-  E   ^         5(m  -  P)  +  /? 
r=0        V^ 


-r-1 


2* 


X  |iSm 


expl  -^Sm 


1. 
"2^ 


(4.3.14) 


Accord  ingly , 
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JSmjexpUSn,]    x    (ihs    of    (4.3.14)) 


__  .      s.(K.-p)..-.Y 


i(tn    -    p)    +    y3    -    1     !     exp    is 


m 


i(n,_p)+/j_l/ 

E    ^  ^C""  -  p)  +  ^ 

r=0  V 


-r 


2^ 


1  1    I  ASn, 


(4.3.15) 


Next  writing  S^  =  (y^.^^  -  X^b^jT^I^  -  P^J(Y(-^^    -    X^bQ)/v, 

and  noting  the  idempotency  of  1^,  -  Py  ,  it  follows  that  Sm 

—  1  2 
~  Uq  Xm-p •   Now  using  Markov's  inequality,  for  every  £>0, 


'MSn,/(u5\m    -    p))    - 


>    e 


<    e-^E 


|sm/(u5l(tn    -    p))    -    ll 


0(m-2) 


(4.3.16) 


Application    of    Borel -Cantel  1  i    Lemma    now    leads    to 
Sm/fuQ    (m    -    p)  J       ^  '    1    as    m— ►oo.       Hence, 


J(m-p)+/?-l/i 


-r 


r=0 


(m    -    p)    +    /?    -    1  is 


m 


<  E 


i(m-p)+/?-l 


r=0 


<  E°° 

r=0 


i(m-p)  +/?-l 


/(JCm-p) 

r  r 

i(m    -    p)    +    /?    -    l)      /n(m    -    p)  ) 


(Sm/(ni  -  P)) 


(Sm/(m    -    p))       . 

(4.3.17) 
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Choose  niQ  so  large  that  2(/?  -  l)/(mQ  -  p)  <  gQ  where  1  +  gQ 


<  Uq    .       Hence,  for  m  >  hiq  , 


'hs    of    (4.3.17)     <    E^^^(l    +    go)''(Sm/(n.    -    p)) 

a^s.    ^00/^(1    ^    g^)\^    ^    ^_  (4.3.18) 

r=0^  ' 


„  =  (is„)-0'-'^^-')(^, 


Also,     let    «(m)    =    USm  |    '  |^(m    -    p)    +    /?    -    1|!     x 

exp    iSm     .       Hence , 


log    «(m) 

_    1  i('n-P)+/?-l 

^  j=2 


log    j    -    (l(m    -    p)    +    /3    -    ij    logUSmj 

l(m-p)+/?-l  \  /  \ 

>    gSm    +    y  log    xdx    -    f  J(m-p)  +/?-  ij  logUSm] 

=    JSm    +    fi(m    -    p)    +    /?    -    ij    logU(m    -    p)    +    /?    -    1  J 

-    U(m    -    P)    +    /?    -    ij    +    1    -    U(m    -    p)    +    ^    -    ij    logUSn, 
=    ^Sn,    +    U(m    -    p)    +    /?    -    ij    logh     -    ^(^^1) 


1  + 


m  -  p 


-  l(m  -  p)  -  (/?  -  2)  -  U(m  -  p)  +  /?  -  ij  logL  2^"^   V 


2 

(4.3.19) 


Since  Sn,/(m  -  p)  ^^'    Uq^  ,  it  follows  from  (4.3.19)  that 

jHjm^m"^  log£(m)  >  l^u^^  -  1  -  log  Uq^)  >  0  a.s.     (4.3.20) 


146 

Hence,  from  (4.3.15)  and  (4.3.17)  -  (4.3.20),  one  gets  that 
Ihs  of  (4.3.15)  — ►  oo  a.s.  as  m  — ►  oo .  Once  again,  recalling 
that  Sni/(m  -  p)   ^  '  Uq   as  m  — »  oo ,  it  follows  from 


as  m  — ►  oo , 


(4.3.13)  and  (4.3.14)  that  Tm(Y.  s;  1,  /?)  ^^ '  Uq 

If  m  -  p  is  odd  and  /?  is  an  integer,  use  the 
inequal ity 


f^    i(m-p)+/?-l    /  1  „  V 
/  u  expl  -iuSm  pu 

0  ^  ' 

/■■'■  Um-p+l)+/3-l         (    -,    \ 
>  u^  exp  -iuSmJdu,  (4.3.21) 

and  proceed  similarly  as  before  to  conclude  that 

^"'V-fs'l'  ^  '  '^j   ^  '  "o  ^^  ""  ~*  °° '       Thus,  for  integer  /? , 
Tm(Y/g);  1»  Pj   a^.  Uq  as  m  —  oo. 

For  noninteger  j3 ,    we  make  the  following  observation. 

Denote  f^^  a(u|Y^  ^j  by  fa(u)  for  brevity.   Hence  for  0    <    /?' , 

/?'  —  /? 
f  ,(u)/f^(u)  oc  u      I  in  u.   Using  this  monotone 

likelihood  ratio  (MLR)  property,  it  follows  from  Lemma  2, 
part  (i)  of  Lehmann  (1986,  p.  85)  that  if  [/?]  denotes  the 
integer  not  exceeding  /? 

El,[/?](U|Y(S))  ^  El,>l?(s))  ^  ^l,mU''\'^-(.)) 
i.e.,  Tm(Y(g);  1,  [/?])<  Tm(Y(s);  1,  p) 

<    Tm(Y(3);  1,  [/?]+l).  (4.3.22) 
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Since  Tm(Y^^^;  1,  [/?] )  ^^-    Uq    and  Tm{Y  (^^y,     1,  [/?]+l)  ^-  '  Uq 
as  m  — <•  oo  so  TmfY^  x;  1  ,  /?j   -^  "  Uq  .   This  completes  the 
proof  of    Lemma  4.3.1. 

For  a  general  a  (>1),  we  use  Lemma  4.3.1  to  prove  the 
following  lemma.  This  lemma  plays  a  vital  role  in  proving 
Theorem  4.3.1  below. 

Lemma  4.3.2.   Suppose  Y(^\    ~  NfX^bQ,  (V  +  ^o)Jm)-   Then 
TmlY/  \i    <^  i    P)    a-s  defined  in  (4.3.11)  converges  a.s.  (as 
m  —  oo)  to  Uq    for  a  >  1  . 

Proof  of  Lemma  4.3.2.   Denote  f   /jfulY,  x)  by  f   ^(u)    for 
a,P\     '-(s)y   -^   a,fi^     ' 

convenience.   Then  for  fixed  /?  and  0<a<a'. 

V,/?*^"V^a,/?<^"^  "^    ^^  "  ")"''"  i  i"  "•  (4.3.23) 

Integrating  by  parts  the  numerator  in  (4.3.11),  for  a  >  2, 

^a,/?(u|^(s))  =  (m  -  P  +  2/?)/Sn, 

-  2(a-  l)Sm^E^^^|^U(l-U)-l|Y^gJ.    (4.3.24) 

Using  the  MLR  property  as  given  in  (4.3.23),  the  fact  that 
u/(l  -  u)  t  in  u,  and  Lemma  2(i)  of  Lehmann  (1986,  p.  85) 
once  again,  it  follows  that 


Ea,/?[^(1-^)"'|Y(,)]  <  E2,^[U(1-U)-1|Y(^J 
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=    {[^l,/?(^l^(s))j'    -    1} 


^^-    (U5I    -    1)-1    as    m    ^    00  (4.3.25) 


"from    Lemma   4.3.1.       Since    Sn)/(m    -    p)        —^'    Uq      as    m    — ►    00,     it 
follows    from    (4.3.24),     (4.3.25),     (4.3.7)    and    (4.3.11)    that 


Tm(Y(3);     a,     0)    =    E^^^(U|Y^^^)    -.    Uq 


3<  ■  S  < 


(4.3.26) 


as  m  — ►  00  for  a    >    2.       Thus,  for  a  >  2,   1  im  TmlY^  ^;  a,     3]    = 

-  —  m— '00  "'v-(s)'   '   / 

Uq  a.s.   Now,  for  l<a<2,  using  the  MLR  property  once  again, 

E2,/?(^l?(s))  ^  E^,/?(^IY(s))  ^  ^l,/3(^l^(s))-       (4.3.27) 

Since  both  E,^^(U|Y(^))  =  T,(Y(3);  1,  ^)  and  E2,^(u|Y^^^)  = 
TmlY/  n!  2,  p]    converge  a.s.  to  U/-v  as  m  — ►  00,  it  follows 
from  (4.3.27)  that 

Tm(Y(3);  a,    /?) 

=    E^    ^(U|Y('    \)   ^^'    Uq    as    m    — ►    00    for    l<a<2. 


a.s. 


Thus  Tti,(Y^  n;  a,    /?j  "-lj^'    ^^      as  m  — ►  00  for  all  a    >    1,  and 
the  proof  of  Lemma  4.3.2  is  complete. 


Remark  4.3.1.   For  0<a<l  we  conjecture  that  the  Lemma 
4.3.2  is  true.   This  is  because 
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=  1 


"a+1 


,/?[(!  -^)i^(s)]=  1  +  .gE^+l,;,(uMY(3)), 


Hence,  by  Lemma  4.3.2,  for  any  fixed  i 


^a+1,0 


U^Y 


(s) 


=  ]V«+i,/?+j(^iV(3))  -  4 


a .  s  .  as  m  — ►  oo  , 


Hence    for    each    £    =    1,    2,..., 


a .  s 


lim       E  E   ^-1     „fuMY.    0    =     E  uA 
m  — oo    j^^    a+l,/3V       '  ~  (s)//         ^-^^    0 


Finally    making    £    — ►    oo    we    have 


lim      lim       E  E,,  1     /^fu^Y.    0   =    .r-^ — 


Ski  •  S  I 


Hence,  if  we  can  change  "the  order  of  the  above  iterated 
1  imits ,  then 


m-KX)  .■^^      a+l,/?V   '  ~  (s)/    1  -  Uq 


a.  s  , 


or    equivalently 


lim     E^    J^l^r    •,]    =    u^, 
m— '00       a  ,  fj\     '  ~  (s)y  O 


Qt  •   S   I 
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However,  one  has  to  justify  the  change  in  the  order  of 
these  1 imits . 

We  now  turn  to  the  main  theorem  of  this  section. 
Theorem  4.3.1  below  proves  the  A.O.  property  of  Cgj  defined 
in  (4.3.6). 

Theorem  4.3.1  .   Suppose  Y,  .  ~  N^X^Bq,  (V  +  (5o)Im)-   Assume 
the  condition  (4.2.2.1).   Then,  if  a  >  1  ,  rg  ^ttq  ,  Cgjj  - 


as  m  — ►  oo , 


Proof  of  Theorem  4.3.1.   From  (4.3.10),  we  have 


Qm('^0'  ^Bl)  -  "^Qml^O'  ^Sb) 


m-^E^ 


T 
'^Bi(Y(3))  -  SSb(^(s)))  9m(eBl(Y(3))  "  ^Sb{^~  ^^)] 


^emma  4.2.2.1 


<  m-lchL(gm)E*|eBi(Y(^))  -  e*B(Y(3))|  ,  by  U 

lch^(Q,)E*|(Y(^)-P^J^^jE^^^(u|Y(3))  -  Uo(Y^^)-X.bo)|  , 


=  m 


by  (4.3.6)  and  (4.3.8) 


=  m 


-lchL(Qm)EJ(Y^3)-P^J(3))(E,,^(U|Y(^^)-Uo)-Uo(p^J(,)-X.bo)[ 


m  ^chL(Qm) 


E*{(e«  ,  ^(U  I  Y  (,))-Uo)  Y'[3)(l'n-eX  Jy  (s)}  +  -0^*\^~xj  (s)-^*^of 


(4.3.28) 


...a.; 
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The  las-fc  equality  follows  since  ^a,p{^^~(s))    ^®  ^  function 
.'T  (t_     _    p..  W  ,  .  and  hence  IE   of  U  I Y  ,  ."l  -  u^  I   x 


of  Y'._^(Im  -  P 


)(l>n  -  exjY(,)  and  hence  (e^  ,^(u  |  ?  (^))  -  u^j 
Y'[^)(lm  -  exjY(,)  i«  -  function  of  \l^{lr.    -    PxJ^(s)  "^^^h 
is  distributed  independently  of  P^  -('s')  ""*^^''  ^^^ 
distribution  "'T(->(y  Cg))  * 

Also,  since  under  m^o(y(g))'  Y'(s)(lm  "  Px*)^(s)  ~ 

(V  +  ^o)^--P'  (">  -  P)"'?{s)(l'n  -  PxJY(s)  "^"  V  +  -^0-   "-"-« 
by  Lemma  4.3.2  (e^  ,  ^(u  |  Y  (g))  "  UqJ  m-lY'[^)(lm  -  PxJ^(s)  ^'^' 
0  as  m  ^  00  under  mTQ(y(s))-   Also,  E^^^^UIY^^^j  -  Uq|  <  1 
and  m^^VT  Jim  -  Px  )?(s)  ^®  uniformly  integrable  in  m. 

(Ea,/j(u|Y(s))  -  "o)  Y^g)!!-  -  PxJY(3) 
m  -  oo.   Also,  E*|Px^Y,  .  -  X^b^P  =  (V  +  6q)p.   Now  from 
(4.3.28)  and  the  condition  (4.2.2.1),  we  have 


Then  m"^E'' 


— ♦  0  as 


0  <  1  im 
m— »oo 


<   1  im 
—  m  — oo 


<  r 


1  im  m 

m— +00 


^e*{(e«,^(u|Y(,))  -  uo)  y}3)(i 


-  PxJy 


(s)| 


+  1  im  m  ^(V  +  (5(-,)pUf^ 
m— 'oo    ^      u^   u 


=  0, 


Therefore  r^Jj^Q,    egj)  -  r^J(^Q,    e^g)  -*  0  as  m  -  oo  and  the 
proof  of  Theorem  4.3.1  is  complete. 


.  \ 
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Now  we  will  return  to  the  prediction  o"f  finite 
population  mean  vector  7.   From  I  -III  ,  it  :follows  that  the 
HB  predictor  of  7  is  given  by 


Sbf(^(s))  =  Diag(l-fi,...,  l-fm)Y(^) 

+  Diag(f-^,...,  ^'m)eBi(Y^g^), 


(4.3.29) 


And  from  I   it  follows  that  the  subjective  Hayes  predictor 
of  7  under  the  prior  ttq  is  given  by 


-SB 


(Y(3))  =  Diag(l-f,,...,  l-fm)Y(^) 

+  Diag(f^,...,  fm)egB(V(s))- 


(4.3.30) 


We  will  now  show  the  asymptotic  optimal ity  of  the  HB 
predictor  fRpfY/  \)  •       From  (4.2.1.1), 

rp 

|Diag(f,,...,  fm)(eBi(Y^^^)  -  e*B(Y(,)))}  Qm 


=  m-^E^ 


X  |Diag(f,,...,  f,)(^eBj(Y(^))  -  ^^q{Y ^,y] 
<    m-lchL(qm)chL(Diag(f2,...,  f2)) 


X  E" 


-1 


^bi(Y(,))  -  ^SB(Y(3))f] 


(by  Lemma  4.2.1.1) 


<  m  ^chL(Qm)E' 


hl(Y(s))  -  ^SB(V(s))f] 
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2 
since  ^-  <  1  for  all  i  =  1,...,  m .   From  this  we  have  as  in 


Th 


eorem  4.3.1  that  r^^(^irQ,    egp)  -  ^^^(^irQ,    ggg)  -*  0  as 
m  — ►  oo . 

4.4   Asymptotic  Optimality  with  Unknown 
Variance  Components 

In  this  section,  we  dispense  with  the  assumption  of  a 
known  variance  component  at  the  first  stage,  and  consider  a 
hierarchical  model  similar  to  the  one  considered  in  Section 
4.3. 

I.  Conditional  on  Q=^,  B=b,  R=r  and  A  =  A,  Y-  ~ 
Nftf.lx.  ,  r~  I«j  j ,  i  =  1,...,  m  independently; 

II.  conditional  on  B  =  b,  R  =  r  and  A  =  A,  6  ~ 
N(X*b,  (Ar)"  Jm)'  where  X^(mxp)  is  assumed  to  have 
rank  p  <  m; 

III.  B,  R  and  c    —    AR  are  mutually  independently 
distributed  with  B  ~  uniform(R  )  ,  f  has  pdf  f  (f)  oc 

£~"~   fO  <  or  <  i(m-p)j  and  R  ~  gammaf^aQ,  ^gn).  that 

is  R  has  pdf  h(r)  oc  expf-laQr)  r      ,  where  aQ  >  0, 
and  gQ  (real)  satisfies  (n  -  l)m  +  gQ  >  0  where  n  is 
the  sample  size  from  each  stratum.   Thus,  €  has 
improper  gamma  pdf,  while  R  has  a  proper  or  an 
improper  gamma  pdf  with  its  parameters  satisfying 
certain  conditions. 
Here  also  we  are  interested  in  proving  the  A.O.  property  of 


154 


the  HB  predictor  of  7.   We  will  attain  this  by  proving  the 
asymptotic  optimal ity  of  the  HB  predictor  of  9.       Assume 
that  we  have  a  sample  Y.  ..,...,  Y-   of  size  n  from  the  i 

stratum.   Let  Y^  -,  =  (Y^  /  \,...,    ^    /    \]      where  Y.  /  ^  = 
-  (s)    V  l(s)'      m(s)/  i(s) 

-1  n  m     n  /       _     \r) 

n-l.E^Y^j,  i  =  1,...,  m  and  S  =  _E    .E  (Vij  -  ^ ;  ^,^f  -       From 

I,  it  easily  follows  that  (Y^  %,  S)   is  minimal  sufficient 

for  (^  ,  rj  .   So  our  posterior  distribution  will  depend 

only  on  {xj^y    s)^ . 

Using  I  -  III  with  routine  calculations,  one  obtains 

the  following  results: 

(i)   conditional  on  Y^  >,  =  y^  >.  ,  S  =  s,  R  =  r  and  A  =  A, 


e  ~  N|^(n+A)-l(ny(^)+APx^y(3)),  (nr)  "^  (n+A) -^(nlm+APx^)]  ; 

(4.4.1) 
(ii)  conditional  on  Y^  %  =  y^  x.  ,  S  =  s  and  A  =  A, 

R  ~  Gamma^l(ao+s+  -nA_yT^^(  i^_p^^)y  ^^^)  ,  1  (nm-p-2a+go)  j  ; 


( i i i ) cond it ional  on  Y,  .  =  y^  .  and  S  =  s,  A  has  pdf 


f(A|y(^),  s)  oc  (A/(n+A)f  ""-pV-I 

(4.4.2) 
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Let  U  =  A/(n  +  A).   Then,  from  (4.4.2)  it  follows  that  the 

conditional  pdf  of  U  given  Y^  ^  =  V/  n  and  S  =  s  is 

-(s)    i(s) 


■fa(u|y(g),  s) 


i(ni-p-2a)-l       xa-1.,  ^   „  -Knm-p-2a+gQ) 
a  u  (1  -  u)    (1  +  uF)  , 

(4.4.3) 

0  <  u  <  1,  where  F  =  ny,  Jim  -  P^^jy  .  -.y' (aQ  +  s)  .   Note 
that  if  aQ  =  0,  F  is  a  multiple  of  a  usual  F  statistic. 
Also,  from  (4.4.1)  and  (4.4.3),  one  gets  the  HB  estimator 
of  0    under  the  loss  (4.2.2)  given  by 


^HB 


(^(s)'  S) 


=  E[e|Y^^),  s] 

(4.4.4) 


where 


Ea(u|Y(^^,  S) 


1  1 


0 


i5(m-p-2a)  ,      .  rv  1 

2^       \l    -    u)'*-l(l  +  uF) 


-i(nm-p-2a+gQ) 


du 


1  1 


0 


i(m-p-2a)-l        ^  -t 

^^      ^  (1  -  u)"-l(l  +  uF) 


-i(nm-p-2a+gQ) 


du. 
(4.4.5) 
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We  now  examine  the  asymptotic  (as  m  — ►  oo)  behavior  of  eiTp 

as  an  estimator  of  9    under  the  subjective  prior  Tq  which 

specifies  the  value  Vq    for  r  and  the  distribution  9  ~ 

NIX^bQ,  (-^o^o)"  Im)»  and  under  the  loss  L(tf,  a)  given  in 

(4.2.2).   Then,  marginally  Y^  s  and  S  are  mutually 

independent  with  Y/  n  ~  N(x,^bQ,  Tq  (n~   +  Aq  )!„,]  and  S  ~ 

T^nX/-       i\     •       We  prove  a  series  of  lemmas  culminating 
O  ^  (n-1  )m       '^  * 

eventually  in  Theorem  4.4.1  which  establishes  the  A.O. 
property  of  erjp  as  an  estimator  of  9  under  the  general 
quadratic  loss  given  in  (4.2.2). 

Denote  the  expectation  Eq(U|Yx  >.  ,  S)  given  in  (4.4.5) 
by  WnilY^  -.,    S;  aj.   We  subscripted  it  by  m  so  tht  we  can 
consider  the  sequence  jWuilY^  >.  ,  S;  a):  m  >  1>. 

First  the  following  lemma  is  proved.   Then  this  lemma 
will  be  used  to  prove  a  general  lemma  for  a    >    1. 

Lemma  A.A.I.  Let  Uq  =  \q/ {r\  +  Aq)  .  Consider  the  case 
a  =  1.   Then,  as  m  — ►  oo ,  WmfY^  >.  ,  S;  Ij  ^i^'    Uq  ,  where 

Y^  >,  and  S  are  mutually  independent  with  Y^  ^  ~ 
~(s)  J  t-  _  (g) 


N(x*bQ,  VQ^in    1  +  AqI)!^)  and  S  ~  ^q-X^^^_^^ 


m 


Proof  of  Lemma  4.4.1.   Assume  m  is  so  large  that  t^  = 
l(m  -  p  -  2)  >  1  and  t^    =    ^(C"  "  1)™  +  &q)    >    1-   Since 
a  =  1,  it  follows  from  (4.4.5)  that 


157 


^'"(-(s)'    ^'     ^j 


=     /    u    1(1    +    uF)    ^    1      ^'dn  I  /    u    1      (1    +    uF)    ^    1      ^Un 


0 


0 


(uF/(l    +    uF))    l(l/(l    +    uF))   2      (f/(1    +    uF)2)du 
0 


to-1- 


-r     /   (uF/(l    +    uF))    1      (l/(l    +    uF)]   2      ^p/(i    ^    uF)2jdu 
0 


=    F 


0 


F/(l+F)  ^      o  /   //(l+f') 

V    1(1  -  v)     ^       dv 


7/ 

'     0 


V    1    "(1  -  v)''2      d^_ 


(4.4.6) 


Integration  by  parts  gives 


1 

/ 


F/(l+F) 


v'^l(l  -  v)  2   dv 


=  -(f/(1  +  F))''l(l/(1  +  F))*2-l(^^  _  i)-l 

//(1+F) 
0 


(^1/(^2  -  1 


')/ 


t.-l        to-1 
V  1   (1  -  v)  -^   dv.  (4.4.7) 


Combining  (4.4.6)  and  (4.4.7),  one  gets 


'  F/(l+F) 
0 


t2  -  1)f)  -  (t2  -  1)-1f  1   (1  +  F) 


_-l 


t-,-1        to-1 
V  ^   (1  -  v)  -^  dv 


(ti+t2-l) 


(4.4.8) 
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Note  that  since  S  ~  Tq  X  /    _->  \     >  S /(  (n  -  l)inj   ^  '  r]^   as 
m  -*  oo .   Again,  using  the  idempotency  of  1^  -  Pv  ,  it 
follows  that  nY'[^^(l^  -  PxJY^^)  =   "(y^^^  -  X*bo)T(l^  -  P^J 
""    (^(s)  "  ^*^o)  ~  r5lu5lxm-p.   Hence,  nY'[g-j(ln,  -  Px^)Y(s)/m 
^^"  Tq  Uq   as  m  — ►  oo.   Thus,  '^\/{i'^'2    ~    1)'^)  = 
{(m  -  p  -  2)/((n  -  l)m  +  g^  -  I^V'^    ^^'    (n  -  1)'^    x 

(n  -  1 )  Tq  /(rQ  Uq  J  =  Uq  as  m  — ►  oo .   Hence,  for  proving 
Lemma  4.4.1,  it  suffices  to  show  that  the  second  term  in 
the  rhs  of  (4.4.8)  converges  to  zero  a.s.  as  m  — ►  oo  under 

the  distribution  of  Y^  ^  and  S  given  in  the  lemma. 

-  (s)        * 

With  this  end,  assume  first  that  t^  is  a  positive 
integer,  and  use  successive  integration  by  parts  to  obtain 

F/(l+F) 
r    ^  ^  t-,-1        to-1 

/         V  ^   (1  -  v)  ^   dv 

0 

-(t.+t^-l)    tn-lf  t.-j-l 

=  _(1  +  F)  '  1   2   ^^1    (t   -  1)  F  1  ' 

J=0  i   ^      J 

^  (t2  ...  (t2  +  j))|  +  (t^-  i)!r(t2)/r(t^  +  t2), 

(4.4.9) 

where  (x)q  =  1,  (x)  •  =  x(x  -  1)  ...  (x  -  j  +  1)  for  j  = 
1,  2,...,  X  and  r(-)  is  the  usual  gamma  function.   Hence, 
from  (4.4.9) , 
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,     F/(l+F) 
t.+t<5-l    -(t-,-1)  r    '  ^         ^    t-,-1  to-1 

(t2    -    1)(1    +    F)     1      2      p    V    1      y/  V    1      (1    -   v)    2      dv 

0 

t.-l  (t^-l).F"J  t.+t^-l 

=  -(^2  -  i)i:.^^  .,(.,.1)  .t  (t,.j)  -  (^  -  F)  1   2 

X    f"    "^1"^    (t2    -    l)(t^    -    l)!r(1:2)/r(1:^    +    1,^) 


^1-1    (ti-l)-^    ,-j    ,     ,,     ,     ^   ti+t2-l 


>    -(^2   -    1)E.\  V+1      ^~^  +    (1    +   F) 

J=0       tJ+' 

X  f"  "^1"^  (t2  -  i)(t^  -  i)!r(t2)/r(ti  +  -fc2) 


>    -(t2    -    l)t2^E^^J(ti    -    l)/t2}      F   J    +    (1    +    F)     1 


X    f"    '^1    ^    (t2    -    l)(t^    -    l)!r(t2)/r(l:i    +   -t2) 


ti+t2-l 


(4.4.10) 


Note  thai;  the  first  term  in  the  rhs  of  (4.4.10)  converges 
a.s.  to  -E^^^Cn  -  l)"'^((n  -  l)uoy  =  'E'^^^^i    =    -(1  -  Uq)"^ 
as  m  — ►  oo .   Also,  using  Stirling's  bounds  for  gamma 
functions , 

second  term  in  the  rhs  of  (4.4.10) 

t-i+t^-l  -(t.-l)     1 
>  (1  +  F)  1   2   p  ^  1  \2n)\t^    -    1) 

t   -i  t   -i 

X  exp(-(t^  -  l))(t^  -  1)  1  'exp(-(t2  -  l))(t2-  1)  ^  ' 
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exp 


(-(t-,^+t2-l))(tj^+t2-l)  ^   ^  ^expf  l/(l2(t-^+t2-l) 


(1  +  F)  1   2   p  V  1   ^(2^)2(^2  -  l)e(ti  -  1)  1  \t^    -  1)  2  2 


(*1  ^  ^2  -  ^)"^'^^'""^^-p(-12(t,  -.^2  -  !))■ 


(4.4.11) 


Recall  that  t^  =  l(m  -  p  -  2)  ,  t2  =  ^i^"  ~  ^^"^    "*"  Sq)  '  ^"<^ 
F   -^  *  (n  -  1)~  Uq   as  m  — >  oo .   Write  h(m)  i^or  the 
logarithm  of  the  rhs  of  (4.4.11).   Then, 


1  im  m   h  (m) 


m— Kx> 


a .  s 


•>"•  In  log(l  +  ^^)  -  1  log(u5V(n  -  1)) 


+  i(n  -  l)log(n  -  1)  -  in  log  n 
l|^n  log(n  -  1  +  Uq  )  -  log  Uq^  -  n  log  n J 


1  -  u 


-1 


n  log  (l ^n-^)  -  log  U5I 


h    log 


>  0, 


(4.4.12) 


using  (1  +  x)"  >  1  +  nx  for  0  <  x.   Hence  h(m)  -^    00  a.s, 
as  m  —  00,  and  it  follows  from  (4.4.11)  and  (4.4.12) 
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that  the  second  term  in  the  rhs  of  (4.4.8)  converges  to 
zero  a.s.  as  m  — *  oo . 

If  t|  is  not  an  integer,  write 


ti-l  -(t.+t^) 

h^  (u)  =  u  ^   (1  +  uF)    ^   ^ 


•^  t.-l^         -(t.+t^) 
..1  r^     ^    ..I^^    1   2^^^ 


-r  /"  u  ^   (1  +  uF) 
0 


The 


>t'-t. 


n,  for  t^  <  t'-.,  h  ,  (u)  /h^  (u)  a  (u/(l  +  uF))  ^   •"■  T  in  u. 
Using  this  MLR  property,  and  writing  ^imiX  r    \>  S;  l)  as 
E^  (uiYcs)'  S),  it  follows  that  E^  (uiY.^.,  s)  is  T  in  t^^ 
where  E^   denotes  the  expectation  w.r.t.  h^  (u) •   Hence,  if 
[t-,]  denotes  the  integer  part  of  t-,,  since  [t-,]  >  1, 

^[ti](^l^(s)'  S)  <  E^,(U|Y(,),  S)  <  E*^^3^,(U|Y(^),  S). 
Since  both  Ep^  -,(u|Y.  .,  s)  and  El^^  1+l(^l^('s)'  ^)  converge 

00,  it  follows  that  E^  fu|Y.  .,  s)  ^^ "  Uq 
oo .   The  proof  of  Lemma  4.4.1  is  complete. 


a.s.  to  U/-V  as  m 


Lemma  4.4.2.   Consider  the  set  up  of  Lemma  4.4.1  with 

a.s. 


a  >  1.   Then  Wm(Y^g^,  S;  a) 


Uq  as  m  — ►  oo , 


Proof  of  Lemma  4.4.2.   Assume  m  is  large  enough  so  that  to 


=  i(m  -  p  -  2a)  >  0.   Now, 


162 


Wm(Y(3),  S;  a) 


■^  t^.      __-,,.     „.-(t3+t2) 


=  /  u  3(1  -  u)"-l(l  +  uF) 
0 

^  /V^-^i  _  ,)a-l(i  ^  uF)"^'^3+^2)^,.    (4.4.13) 


Consider  first  the  case  when  o  >  2.   Integration  by  parts 
gives 

/  u  3(1  -  u)'*-l(l  +  uF)    3   2  j^^ 
0 

=  {^3/(^3  +  -^2  -  1)KV  """^"^(^  -  ">""^ 

0 

-(tQ+to)+l  f  ^"1^-1 

X  (1  +  uF)  ^  3   2^   d^^  _  |(„  _  i)/(t3  +  t2  -  1)}f  1 

•^  to        „  o  -(to+t2)+l  , 

X  /  n  3n  _  u^"-2c;l  +  uF^    -^   ^    du 


f   J'^il    -    u)"-2(l  +  uF) 
0 

0 

X  (1  +  uF)    3+  2  j^  ^  {t3/(t3  +  t2  -  1)} 


.1 
0 


(t3+t2) 
X   /  u  ~  (  i  -  u  )    I  i  +  ur  ;  au 


/  u*3(^  _  u)"-^!  +  uF) 


-■>  -. 
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X 

0 


{(a  -  l)/(t3  +  t2  -  1)}f  1 

/■^  to        ^  o         -(to+t;^)+l 

/  u  ^(1  -  u)"-2(l  +  uF)    "^   ^    du.      (4.4.14) 


Combining  (4.4.13)  and  (4.4.14)  one  gets 

=  (t3/(t2  -  1))f-1  -  {(a  -  l)/(t2  -  1)}f-^ 

X  EarU(l  +  UF)/(1  -  U)|Y(^),  Sj  (4.4.15) 

where  recall  that  Eq  is  the  expectation  w.r.t.  f^  given  in 
(4.4.3).   Hence,  from  (4.4.15), 

Wm(Y(^),  S;  a) 

<    (t3/(t2  -  1))f"^  ^^'    Uq  as  m  -  oo .      (4.4.16) 

Also,    writing    faful?.    >.,    SJ    as    fa(u)    for    a    <    a', 

f    ,(u)/fQ(u)     oc     [(u~^    +    F)(1    -    u)T     "    i    in    u.       Hence,    sin 
a  I 

u/(l    -    u)     is    T    in    u,    for    a    >    2, 
E«|JU/(1    -    U)|Y^^^,    sj 

<    E2[U/(1    -   U)|Y(^),    S] 


ce 


-(to+to) 

1    +    uF)         "^      ^  du 
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•^  t^-1       ,  .      „.-(t:3+t2) 


^  /  u  ^   (1  -  u)(l  +  uF) 


du 
0 


<  /  u  3   (1  +  uF)  '  3   2^j^^ 
0 


-=-  /  u  ^   (1  -  u)  (1  +  uF) 


du 
0 


-1 
=  rEi((l  -  U)|Y(^),  S)|    ^^-    (1  -  Uq)-!  as  m  ^  oo. 

^  (4.4.17) 


From  (4.4.15)  and  (4.4.17), 
Wm(u|Y(.g),  S;  a) 

X  (1  +  F)[e,{(1  -  U)|Y(^),  S}] 


-1 


^1^-    Uq  -  0  =  Uq  as  m  ^  oo.  (4.4.18) 

I-b  follows  from  (4.4.16)  and  (4.4.18)  that  for  a  >  2 
Wm(Y/sV  ^'  '*)  ^^   Uq  as  m  ^  oo.   For  1  <  a  <  2,  use  the 
inequality  £2(0!?^^^,  s)  <  Ea(u|Y(s),  s)  <  Ei(u|Y(.g^,  s)  and 
use  the  fact  that  both  E^(ulY.  s,  s)  =  ^m{Y  (^s)'    ^''     ^)  ^"^ 
E2(U|Y.  .,  S)  =  Wm(Y.  X,  S;  2)  converge  a.s.  to  Uq  as  m  ^  00 
to  conclude  that  Wm(Y(s).  S;  a)  ^^'    Uq  as  m  —  cxi.   The 
proof  of  Lemma  4.4.2  is  complete. 
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Under  the  subjective  prior  tTq  which  specifies  r  =  Tq  , 
A  =  Aq  and  e  ~  N(x*bQ,  (rQAQ)~^Im)  the  subjective  Bayes 
estimator  of  9    is  given  by 


^Sb(^(s)'  s)  =  (1  -  Uo)Y(3)  +  UoX.bQ, 


The  following  theorem  proves  the  A.O.  property  of  e 


(4.4.19) 


HE 


Theorem  4.4.1.   Assume  the  condition  (4.2.2.1)  holds.   Then 
for  the  prior  ttq    given  at  the  beginning  of  the  section  and 
fo»~  «  >  1'  ^QmK'  iHs)  -  %K'  ^sb)  -  0  as  m  ^  oo. 

Proof  of  Theorem  4.4.1.   Let  E*  denote  the  expectation 
taken  w.r.t.  the  joint  distribution  of  Y r^\    ^^^    S  specified 
by  the  prior  itq.       Then  as  in  (4.3.10),  we  have 


0  ^  ''Qm(''0'  ^Hb)  -  '^Qm(''0'  ^Sb) 


=  m"^E 


iHB(Y  (s)  '  S)  -  isB(^  (s)  '  S))  qm(eHB(Y  (s)  '  ^)  "  ~^Sb{^~  (.)  '  ^) ^ 
<m-lchL(qm)E*|iHB(Y(3),  s)-  §33(1(3),  s)|  ,  by  Lemma  4  .  2  .  2  .  1 
=  m-lchL(Qm)E*|(Y^3)  -  ?>(  J  (s)  W^  I  ?  (s)  '  ^)  '  (^(s)  "  ^*^oK| 
=  m"  chL(Qm) 

'^  H(Y(s)-exj(s))(Ea(^l^(s)'S)-"o)-"ofej(s)-^*^o)   " 


(4.4.20) 
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Note  that  Cov^Q^(l^  -  PxJY(s),  PxJ(s)j  =  ^0^(""^  +  ^0^)  ^ 

(im  -  exj?x*  =  -5'(""'  +  ^5')(ex.  -  ^xj  =  Q  -i"--  ^x.  ^^ 

idempotent.   So  under  itq,    P^    ^  ( s)    ^"*^  (-'"  ~  -X^j^Cs)  ^^^ 
statistically  independent.   On  the  other  hand,  from  (4.4.3) 
it  follows  that  Ea^U|Y/g\»  s)  is  a  function  of  (im  -  Px^JY /g-j 

(since  Y'[3)(lm-PxjY(g)  =  (dm  -  Px,)Y(3/(I.  -  Px.)Y(3)) 

and  S.   Also  Y.  s  and  S  are  statistically  independent.   So 

~  (s) 

the  two  terms  inside  ||  |2  in  (4.4.20)  are  independently 
distributed.   From  this  discussion  and  noting  that 
E*(Px  Y  (    \)    =    X*bQ,  we  have 

the  rhs  of  (4.4.20) 


=  m   chL(gm) 


^1(^(s)-exJ(g))(Ea(U|Y(3),S)-Uo)|^ 


+  E*|uo(PxJ(3)-^*feo)f 


m  ^chL(Qm) 


E*(^(e«(U|Y^3),  S)  -  uof  yT^)(I,  -  Px,)Y(g)' 


+ 


u2r5l(n-l  +  A5l)tr(p2J 


m  ^chL(gm) 


E*((E«(U|Y(3),S)-uofYT^)(I,-P^^)Y(3)^ 


+  "0^0^"  S 


(4.4.21) 
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since  trfp^  )  =  trfPj^  )    =    rank(X*)  =  p.   As  in  the  proof  of 
Theorem  4.3.1,  we  can  prove  m~  E*MEa(U|Y,  ^,  S)  -  UqJ  Y  (^^^ 
X  dm  -  Px  )^rsW  -*  0  ^s  m  —  oo.   Hence  from  (4.4.20), 
(4.4.21)  and  (4.2.2.1),  we  have 

^  ^  mU'SoE*((E«(^l^(s)'  S)  -  UofyT^^d,  -  Px,)Y(s)A 
=  0. 


a^nce,     rg^('ro,  ^m)    -    ^Qm(''0'  ^sb)  -  0  as  m  ^  oo  . 

We  may  now  apply  this  theorem  for  HB  prediction  of  y. 
From  I  -  III,  it  follows  that  the  HB  predictor  of  7  is 
gi\en  by 

Sbf(^(s)'  ^)  =  Diag(l-fi,...,  l-fm)Y(s) 

+  Diag(fi,...,  fm)iHB(^(s)'  ^)       (4.4.22) 

where  f.  =  1  -  n/N-,  i  =  1,...,  m.   From  I  it  follows  that 
the  subjective  Bayes  predictor  of  7  under  the  prior  Wq    is 
given  by 

eSB(^(s)'  S)  =  Diag(l-fi,...,  l-fm)Y(3) 

+  Diag(fi,...,  fm)isB(^(s)'  ^) '  (4.4.23) 
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Note  that 


(Y(,),  s)  -  esB(Y(^),  S) 


=  Dii 


gives 

|iBF(Y(s)'  S)  -  eSB(Y(s)'  ^)f 

<  |iHB(Y(s)'  S)  -  isB(^(s)'  S)|  • 
Using  the  above  inequality,  we  can  easily  show  as  in 


Section  4.3  that  r^J^w^,    egp)  -  ^^J^^q,    egg)  -  0 


as  m  — ►  oo . 


CHAPTER   FIVE 

SIMULTANEOUS  BAYESIAN  ESTIMATION 

OF  SMALL  AREA  VARIANCES 

5 . 1    Introducb  ion 
In  "this  chapter,  we  are  interested  in  simultaneous  HB 
estimation  of    variances  from  several  small  areas,  where 
each  small  area  has  a  finite  number  of  units.   This  chapter 
is  developed  following  in  part  the  outlines  of  the 
preceding  three  chapters.   We  will  develop  the  HB  estimator 
of  the  finite  population  variance  vector  assuming  an 
underlying  normal  linear  model  for  the  superpopu lat ion 
under  a  quadratic  loss. 

We  use  the  notations  introduced  earlier.   Let  p    = 

T 
(/>-.,...,  ^m)   denote  the  finite  population  variance  vector 

N.        _ 
where  p.    =    (N-  -  1)"^.sVy..  -  Y-)^,  i  =  1,...,  m.   We  want  to 

find  the  HB  estimator  of  p    and  study  its  asymptotic  optimal 

property  under  the  loss 


L(a,  p)    =  m-l(a  -  /7)'^Qm(a  -  p)  (5.1.1) 


where  Qm  (mxm)  is  a  known  n.n.d.  and  nonnull  matrix.   This 
loss  has  been  used  in  the  previous  chapter. 
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Bayesian  estimation  of  variances  from  stratified 
samples  were  considered  earlier  by  Ghosh  and  Lahiri  (1987b) 
and  Lahiri  and  Tiwari  (in  press)  without  incorporating  any 
auxiliary  information.   Hartley  and  Rao  (1968)  introduced 
auxiliary  information  in  the  estimation  of  variances  for 
stratified  samples  only  in  a  very  special  set  up.   The 
present  chapter  treats  the  variance  estimation  problem  in 
the  general  framework  of  Chapter  Two. 

In  Section  5.2,  we  have  developed  under  the  set  up  of 
Section  3.2  (i.e.,  with  known  ratios  of  variance 
components)  the  HB  estimator  of  a  quadratic  form.   We  have 
used  this  result  to  derive  explicitly  the  HB  estimator  of 
p.       This  estimator  is  considered  in  greater  details  in 
Section  5.3,  in  the  special  case  of  nested  error  regression 
model  (2.2.3)  with  x^j  =  Xj,  j  =  1,...,  N^;  i  =  1,...,  m.   As 
in  Chapter  Four,  we  study  the  asymptotic  optimal ity  of  the 
proposed  estimator.   Since  this  problem  is  much  more 
algebraically  involved  than  the  one  we  considered  for  the 
means  in  Chapter  Four,  we  restrict  ourselves  to  this 
special  case.   While  we  prove  that  this  HB  estimator  is 
asymptotically  optimal  under  the  loss  (5.1.1)  and  the 
subjective  prior  ttq  of  Section  4.4,  we  establish  that  the 
usual  sample  variance  vector  turns  out  to  be  nonoptimal. 
To  prove  these  results,  we  do  not  need  the  n^  to  be  equal. 
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So  far  we  have  assumed  that  the  ratio  of  variance 
components  is  known.   In  Section  5.4,  for  this  special  case 
again,  we  derive  the  HB  predictor  following  the 
hierarchical  Bayes  set  up  of  Section  4.4  when  both  variance 
components  are  unknown  and  are  assigned  gamma  priors 
(proper  or  improper).   Following  the  arguments  of  Section 
4.4,  we  have  been  able  to  prove  the  asymptotic  optimal ity 
of  our  HB  predictor  under  the  additional  assumption  that 
all  the  n.  are  equal.   Ghosh  and  Lahiri  (1987b)  and  Lahiri 
and  Tiwari  (in  press)  have  also  proved  the  asymptotic 
optimal ity  of  their  EB  estimators  under  average  squared 
error  loss  without  requiring  the  n-  to  be  all  equal.   But, 
as  pointed  out  earlier,  they  have  not  used  any  auxiliary 
inf ormat  ion . 

5.2   Bayes  Estimation  of  a  Quadratic  Form  when 
Ratios  of  Variance  Components  Known 

We  will  consider  the  set  up  described  in  Section  3.2. 

T 
We  are  interested  in  estimating  a  quadratic  form  Y  FY 

where  F  (N^pxNrp)  is  a  known  symmetric  matrix.   Writing 
F  =  (~^^     ~^^)  where  F-^^^    (n^xn^)  ,  F^2  ("T  ^     ^^T  "  "T^J 

d  F22  =  ((Nt  -  nj)  X  (Nj  -  "t))  ^^    ''^^  ^''^^'^  ""P  -  "  '"*° 


an 


three  parts  given  by 
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(5.2.1) 

Under  squared  error  loss,  from  (5.2.1)  the  Bayes  estimator 
of  Y^FY  is  given  by 

eB(Y^'^)  =  e(yTfY|y('^) 

=  y(^)^f,,y(^)  .  2y(^)\,2<Y^'^y(^^) 

+  e(y^^)'^F22Y^^^|Y^^)).  (5.2.2) 

Now  from  Theorem  3.2.1  we  have 

e(y^^^|y('^)  =  MY^'^  (5-2.3) 

and 


v(y^^>|y(^>) 


=  (n^  +  go  -  P  -  2 


)-l(ao  +  Y^'^\y^'^)g   (5.2.4) 


where  K,  M  and  G  are  given  by  (2.3.4)  -  (2.3.6).   Using 
(5.2.3)  and  (5.2.4)  in  (5.2.2),  we  have 

eelY^'^)  =  Y^^)^F,,Y^^^  .  2Y^^^^f,.my(^) 

+  (MY^'^f  F22(mY^'^)  +  (ht-  -  go  -  P  -  2)-l 

X  (ao  +  Y^'^''KY^'>r(F22G).  (5.2.5) 
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From  (5.2.5),  we  will  derive  the  HB  estimator  of  p^    = 
N:  2  (1)  _ 


(Nj  -  l)"-^E(Yij  -  Yj)  ,  i  =  I,---,  m-   Recall  that  Y 
l<i<m^  ^   >'  l<i<mV  ^   ^ 


i  =  1,...,  m.   Also,  let  F^  = 


^ill  ^il2 
E'i21  E^i22 


,  where  F";ii  = 


fT^,  and  F.22  =  (Nj  -  D-^lN-.n^  "  ^I^.-n.)-   ^^  ^^  ^^^y 

T 
to  see  that  with  these  notations  p^    =    Y • F jY ^ .   Denote 

e(yP^|Y^'^)  =  (MY^'^)i  and  v(y[^)!y(^^)  =  G^.   Then  from 
(5.2.5),  we  can  write  the  Bayes  estimator  p  ^^    of  p -^    as 

*   (Mi^*')Iei22(MV^'^)i  +  *'-(Ei22Si)-         (5.2.6) 

We  will  use  (5.2.6)  in  the  following  sections  to  find  the 
Bayes  predictor  of  p    in  the  special  case  of  nested  error 
regression  model. 

5.3    Asymptotic  Optimal ity  in  Nested  Error 
Regression  Model  for  Known  A 

We  will  consider  the  model  of  Section  4.4  with  A  =  Aq 
(known),  where  the  n^  are  possibly  unequal.   For  our 
convenience  we  will  rewrite  the  model  explicitly  below. 
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I.  Conditional  on  0  =  ^ ,  B  =  b  and  R  =  r, 

Y-  ~  N^^-lj^  ,  r~  Ij^  j,  i  =  1,...,  m  independently; 

II.  conditional  on  B  =  b  and  R  =  r,  0  ~  N(X*b,  (Aqt)"  Im) 
where  X«  =   col  ( x ' )  is  assumed  to  have  full 

column  rank  p  <  m. 

III.  B  and  R  are  mutually  independently  distributed 
with  B  ~  uniform(R  )  and  R  ~  gammaflaQ,  hsQ)    where 
a-Q  >  0  and  gQ  (real)  such  that  n ^  +  gQ  -  p  >  2 . 

We  can  write  I  and  II  as  a  linear  model 

Yij  =  ^Tb  +  V.  +  e.j,  (5.3.1) 

j  =  1,...,  N-;  i  =  1,...,  m.  All  the  random  variables  v.  and 
e.  •  are  mutually  independent  with  e.  •  ~  Nf 0 ,  r~  J  and  v^  ~ 
n(0,  (AQr)-l),  j  =  1,...,  N.;  i  =  1,...,  m.    .  ^ 

From  (5.3.1),  we  can  write  down  X,  Z,  $  and  D  as 
appeared  in  (2.2.2).   Using  this,  after  some  simplifi- 
cations, it  follows  that  after  writing  u^    =    Xq/ (n -^    +    Aq)  , 
i  =  1,...,  m,  and  b  =  [s^=i(l  -  Ua)xaxj]   S^=i(l  "  "a)?$aY«(s)' 

=  (n.    +  Ao)-l(niY.(^^  +  AoxTb)!^^,^^ 

=  [d  -  "i)Yi(s)  +  "i^T&]iN.-n.'  (^•3-2) 
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Gi 


v(yP)|y(^>) 


=  (n^  +  go  -  P  -  2)-^(ao  ^   Y^^^-^KY^^))]! 


Ni-"! 


1   1  a=l  ^  ^j 


(5.3.3) 


^iB  =  (Ni  -  l)"'|\(Yij  -  Yi(,))' 


Using  (5.3.2)  and  (5.3.3),  we  have  from  (5.2.6),  after  some 
simpl if i cat  ions , 

""1/       -     x2 
k(Yii  -  Yks), 

+  (N.  -  l)-^nifiU?(Y.(^)  -  xTb)^  +  (Ni  -  1) 
N.  -  u.  +  u.(l  -  u.)xT(^aaaJ)   xJ 
(ao  +  Y^^)'^KY(^^)/(nT,  +  g^  -  p  -  2)      (5.3.4) 


-1 


X  f 


m , 


where  a^  =  (1  -  u^)  Xp  i  =  1,..., 

We  will  now  show  that  Pjg,  i  =  1»  •••5  ni ,  is 
asymptotically  optimal  under  the  loss  (5.1.1)  and  under  the 
subjective  prior  itq    which  specifies  that  Y^  ~ 
N((xTbo)lN.,  r5l(l[^.  +  A5ljp^.)j,  i  =  1,...,  m  independently. 
To  this  end,  we  will  have  to  find  the  subjective  Bayes 
estimator  P^gg  o^  Pj-   Under  the  loss  (5.1.1),  for  i  =  1,..., 
m,  after  some  simplifications 
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''iSB 


=  E 


j  =  l 


-    (Yi(s)  -  ^TM   +  (^i  -  ^)''^5'^i(^i  -  "i) 

(5.3.5) 


where  E^   denotes  the  expectation  taken  under  the  prior 

distribution  Wq. 

Let  pQ    =    (?iB'-'  ^ms)   ^""^  £SB  =  (^ISB'"  •'  ^mSe)  ' 
Using  the  notations  of  Chapter  Four,  to  prove  the  A.O.  for 

Pp ,    we  need  to  show 


""Qm 


K'  ^b)  -  ^QmK'  ^Sb)  ^  0  ^« 


m  — ►  oo. 


We  establish  it  in  the  following  theorem  assuming  the 
condition  (4.2.2.1) . 

Theorem  5.3.1.   Assume  that  the  condition  (4.2.2.1)  holds 
Then  for  the  prior  Xq  given  above,  the  model  (5.3.1)  and 
the  loss  (5.1.1),  Pq,    the  HB  predictor  of  p    is 
asymptotically  optimal. 

Proof  of  Theorem  5.3.1.   Using  standard  Bayesian 
calculations,  we  get 

^qj'^o'  ^b)  -  ^Qml'^o'  £sb) 

0  (^B  -  £Sb)  9>n(eB  -  £Sb)J  =  ^'"  ^^^^^  ' 


=  m-^E, 
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Since  a^  >  0  V  m ,  it  is  enough  to  prove  JJ^m^am  =  0.   By 
Lemma  4.2.2.1, 

- 1  "^     /  \^ 

am  <  chL(Qm)ni   .E  Etq^^ib  "  ^iSBJ 

and  hence  by  condition  (4.2.2.1) 

-1  "*     /  \^ 

Ji^m^am  <  mU<go^*^L(9-n)JA'So™   .J^^l^iB  "  ^iSBJ 

^    I'mU'So'"   .Ji^-ol^iB  -  ^iSBJ  • 

So  it  is  enough  to  show  that 

JA'So-'^.S  E-o(^iB  -  ^iSBf  =  0-  (^•^•^) 

Now, 

(Nj  -  l)(piB  -  ^jsb) 

X  (n^,  +  go  -  p  -  2)-l  -  r5H 

X  (n^,  +  go  -  p  -  2)-l,  (5.3.7) 

and 
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(Yi(s)    -    4^)      -    (^(s)    -    ^TM      =   ^i    (^"y> 

=    -    [xT(&    -    feo)f    -    2xT(b    -    bo)(Y.(^)    -    xTb). 

Note  that  under  Tq,  Y..  .  -  xjb  and  b  are  independently 
distributed  with  Y.^gs^  -  xjb  ~  wfo,  r5^(nTluTl  -  A5IXT  x 

(J^a«aX)-lx.))  and  b  ~  N^b^ ,  (AQro)  "Xji^-^")"') '   '^^en 
after  some  simplifications 

/        \-l 


E:ro(ti)  =  -(Aoro)-'xil  E^a«aa  1   x. 


and 


m 


-1 


%(*i)  =  2(roAo)-2xT(  l^a^ajj   Xj 


-1 


m 


-1 


2uT^  -  xM  E  a^aa  I   x^ 


<  4(roAo)-Vi{  E  aaaj)   x-u^ 


2  T/  ^ 
^,a=l 


(5.3.8) 


(5.3.9) 


From  (5.3.7)  and  (5.3.8), 


(Ni  -  i)(piB  -  ^ise) 

"i-i(^i  -  ^-o^^O) 

+  hY(ao  +  Y<^'^'^KY^'))/(nT  +  go-P-2)-r5l 


=  f 


rp/       m  'T'X  —  1 

where  h.  =  (Nj  -  u.)  +  u.(l  -  ^  i)^i[^^^^a^a)      Xj.   Note  that 
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(1  -  "i)x'[(  S  aaaj)-^x.  =  ^J{^^^^a^l)      ^i    ^"^  ^he  mat 

((aT(  E  aaaJ)"-'-a  .))  is  symmetric  idempotent.   Therefore 

aTf  E  aaaj)-la.  <  1.   Using  this  we  get  h.(N.  -  l)"!  < 
~  1  Va=l     ' 

(Ni  -  l)-l[Ni  -  u.  +  u.]  < 


rix  P  = 


2  since  we  will  assume  that  N-  > 


n.  +  1  >  3.   From  all  these  observations,  it  follows  that 


(^iB  -  ^ise)' 


<  2 


n?ut(t.  -  E,„(t.)f 
+  4((ao  +  Y^^)\Y(^>)/(nT,-Hgo-p-2)-r5iy 


Hence 


m-^.S  (piB  -  PiSe)   ^  2m-l_E  nM(^i  "  ^^o^^i)) 

-|2 
+  8[(ao  +  Y^^^\Y^^^)/(nT,  +  go-p-2)  -  r5lj  . 

(5.3.10) 
From  (5.3.9)  and  (5.3.10)  we  have 


m 


1  m     /  \^ 

.i;^%(^iB  -  ^iSBJ 


<  8(roAo) 


+  SEtt 


0 


■^m-^  g  ufn?xT(  2  aaaj)  ^x  ^ 
i  =  l        a=:l 

r 


(5.3.11) 
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Now    since    u^    =    ■^q/^'^O    '*'    "i^    ^"*^    -i    ~    ("i/^'^O    "*■    "i^j^-i' 

o       1   m  i-p/  m  iT<\  — 1 

=    8r5^m    ^Eu.(l    -    u.)aM   ^  a-QaiJ      a.j 
i  =  l  a!=l 

<    2ro^m    ^Ea-i     Eaaai         §-i 


2rQ^m    ^tr 


(  E  &aa.a)      (  E  ^\^\) 
^a=l  '^      ^i  =  l  ^ 


=    2r5^ni    ^p 


(5.3.12) 


Recall    that    ^.^    -      ©   [in.    +    -^n^Jn .    »    X  ^    '^    =       cpl     ( In  .X;)    a"*^ 
~  ^  J-         i  =  lL~     1  '-'    ~     ij  l<i<tn^       1     ^' 


(1)/     (1)T^_.     (l)N-l     (1)T    _1 


K    =    E-1    -    Eilx^'^(x^'^'Silx^'y\x^'^*EiJ.       Since    under    .^, 

X^^'    ~  N(x^''^'^bQ,  r^^E^j),  it  follows  from  Rao  (1973,  see 
Result  (vii)  on  p.  188)  that  Y^"^-^  KY  ^  ^    ~  r5^X?  -p-   Then 


it  is  routine  to  check  that 


1  im  Ejr^ 


(^0 


+  y(^>^ky^^> 


)/(n^  +  gQ  -  p  -  2)  - 


-i2 


0 


=  0 


(5.3.13) 


since  nrp  — ►  oo  as  m 


Combining  (5.3.11)  -  (5.3.13),  (5.3.6)  follows.   This 
completes  the  proof  of  Theorem  5.3.1. 

Let  l^    =  (s?,...,  s2)'^  where  sf  =  ("i  "  1)  "'jii(Yi  j  "  Yj  (^^^ 
uming  n.  >  2,  i  =  1,...,  m.   We  will  show  now  in  the  next 


ass 
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theorem  that  under  some  conditions,  this  estimator  is  not 
asymptotically  optimal. 

Theorem  5.3.2.   Assume  the  conditions  (4.2.2.9)  - 
(4.2.2.11)  hold.   Then  for  the  prior  Tq  given  above,  the 
model  (5.3.1)  and  the  loss  (5.1.1),  ^i  j ,  the  traditional 
estimator  of    p,     is  asymptotically  nono/>timal. 

Proof  of  Theorem  5.3.2.   Let  d^i  =  rQ  ("■q,  ^ij)  -  Tq  (tq,  Pen) 
Then 


dm    = 


'm 


■^^TqI^U    -    lsBY9m{p\j    -    Psb) 


_  1    m  /    2  \2 

>    chg(Qm)m        ^^,-^(^3^     -    P^sbJ     '  ^^    Lemma    4.2.2.1 


and    hence 


UiD  dm    >    wUiD  m-1  £  E^    (s?    -    P:ob)^       ^y    (4.2.2.11) 
m— 'oo  m  — CO         ^^^      Ov    i  lotj/ 

It  is  enough  to  prove  that 


After  some  simplifications. 


^iSB  -  -i  =  [("i  -  i)(Ni  -  1)-'  -  i>?  -  ^5^) 


n.(N.  -  l)-^f?u?[(Yi(,)  -  xjbo)2  _  r5lnTluTl], 


,2 


Under  tTq  ,  s-  and  ^-z    \    are  independently  distributed  for 
i  =  1,...,  m.   Moreover  E^  (s-J  =  rQ  .   Then 
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X    E, 


r.-l„-l„-l 
fq   n.    u. 


_(Yi(3)  -  ^T^'' 

>  2r52f2/(n.    -    1) 

>  2r52(l    -    n./(n.    +    l)f/(n.    -    1)  by    (4.2.2.10) 

>  2r52(K   +    1)"^(K    -    1)-^         by    (4.2.2.9). 


Hence 


1  im  m 
m— ►oo 


"li^^ot? 


^ISb]      ^    2r52(K    +    1)-2(K 


1)-^    >    0, 


The  proof  of  Theorem  5.3.2  is  complete 


5.4   Asymptotic  Optimality  in  Nested  Error  Regression  Model 
with  Unknown  Variance  Components 

In  the  previous  section,  we  assumed  the  ratio  of  variance 
components  was  known.   Here  we  will  dispense  with  this 
assumption  and  use  the  hierarchical  model  described  in 
Section  4.4.   This  is  an  extension  of  the  model  in  Section 
5.3  where  A  (=  Xq)     is  assumed  to  be  known.   However,  unlike 
in  Section  5.3,  we  will  assume  all  the  n^  are  equal  to  n. 

We  will  first  derive  the  HB  predictor  p^^g  of  p.       Note 
that  the  HB  predictor  P ^^q    of  p  .^    is  given  by 
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''iHB 


=  E 


^|Y 


(1)1 


=  E 


(i)\L(i) 


<Pi|A,  Y^^^)|y 


(5.4.1) 


To  facilitate  the  derivation  of  Efpj|A,  Y    )  in 
(5.4.1)  we  will  write  down  the  expressions  of  the  first  two 
moments  of  the  distribution  of  Y^    given  Y     and  A.   From 
III  of  Section  4.4,  it  easily  follows  that 

IV.   conditional  on  A  =  A,  B  and  R  are  mutually 

independently  distributed  with  B  ~  uniform(R  )  and 
R  ~  gamma^iaQ,  i(gQ-2Q)j. 
Write  U  =  A/(n  +  A)  and  u  =  A/(n  +  A).   From  I,  II  and  IV, 
we  have  as  in  (5.3.2)  and  (5.3.3)  with  appropriate 
modifications  that 


e(y[2>|A,  Y^'))  =  [(1  -  U)Y.(^)  +  UxTb]lN__^,     (5.4.2) 


(1)T,.^(1) 


=  (n^  +  gQ  -  2a  -  p  -  2)  ^(aQ  +  Y 


KY^  ^) 
-1 


(5.4.3) 


(1)T   (1) 
In  this  balanced  situation,  n^  =  nm ,  Y     KY     =  S  +  USSR 

where  S  =  J^  _E^(y.j  -  Y.^^^f  and  SSR  =  nYT^)(l,  _  PxJY(,) 

and  b  =  (XX  )   X  Y^  n.   Note  that  b  does  not  depend  on  A. 

y~*~*y   -*~(s)  ~ 
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As  in  Section  5.3,  we  get  from  (5.4.1)  -  (5.4.3)  that 

=    (Ni    -    1)-'.£Jy.  .    -    Y.^^^f    +    (N.    -    l)-lnf.U' 
"^    (Yi(s)    -    4&f   +    (Ni    -    l)"'^i 


-1 


i    -    U    +    U(l    -    U)xjl    E    (1    -    U)x^xM      X. 


X    (aQ    +    S    +    USSRy(nm    +    gQ    -    2a    -    p    -    2) 


Finally,  we  have 


^iHB=  (Ni  -  1)-'.£(V.J  -  Y.(^/ 

+  (N.  -  l)-lnf.(Y.(^)  -  xTbfE«(u2|Y(3),  s) 


-1. 


+  (N.-l)-^f.E« 


{n,-u(.  -.I(J^.,,T)-X^^)} 


(aQ  +  S  +  USSRJlY^-gy  S 


(nm  +  gQ  -  2af  -  p  -  2) 

(5.4.4) 


where  E£^(-)  is  defined  in  (4.4.5), 

Now  we  will  show  the  risk  difference  r^  [ttq  ,  Pog)  - 
'^Qmv  0 '  ^SbJ  ~  ^m  (say)  goes  to  zero  as  m  — >  oo  to  show  that 
the  HB  estimator  is  asymptotically  optimal.   The  prior  tt^. 
is  as  given  in  Section  5.3  (also  in  Section  4.4),   We  will 
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state  and  prove  the  following  theorem.   Note  that  a 
appearing  in  the  theorem  is  a  prior  parameter. 

Theorem  5.4.1.   Assume  the  condition  (4.2.2.1)  holds.   Then 
for  the  prior  itq  given  above  and  for  a  >  1,  Phq    is 
asymptotically  optimal  for  the  balanced  nested  error 
regression  model . 

Proof  of  Theorem  5.4.1.   As  in  Theorem  4.4.1,  let  E*  denote 
the  expectation  taken  w.r.t.  the  joint  distribution  of  Y(^\ 
and  S  specified  by  the  prior  Tq .   Then  following  the  proof 
of  Theorem  5.3.1  it  is  enough  to  show  that 


0  as  m  — ►  oo , 


•"  .S^E*piHB  -  ^ise] 

Let    Ea(u|Y(g^,    s)    =    Wm(Y(s),    S;     a)     =    W^ ,    Ea(u2|Yj,g^,    s)    = 
^•"(-(s)'    ^'     '*)    =    Vjn    and    Uq    =    AQ/(n    +    Aq)  .       Then 

(^iHB  -  ^ise) 

=    (Ni  -  1)  -If  in[(Y  .  (^)  -  xjbf  V„,  -  (y  .  (^)  -  xjbof  ug] 


-1. 


+    (Ni-l)--^f.E« 


|n.-U^1    -xT(J^x,^xT)-^x.^ 


-1, 


X  (aQ  +  S  +  USSR)/(nm  +  gQ-2a-p-2)  -  r5^(N.-UQ) 


-(s)' 


=   fi[tii    +    t2i    +   t3i    +   t^.] 


(5.4.5) 


where 
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'-\2/ 


^li 


Hi 


(5.4.6) 
2 


+    n-lr5lu5lxT(    S^x^xjj      x. 


.-li 


^3i 


and 


^4i 


Then 


(^ 


HB 


(Ni    -    l)-^n(Y.(^)    -    xjb)    (V„,    -    ug), 

(N.    -    l)-^nug|(Y.(^)    -    xjbf    -    (y.(^)    -    xTbo) 

(5.4.7) 

(l    -    NTl)"^{(ao    +   S    +   SSRWm) 

-^    (nm   +   go    -    2a    -    p    -    2)    -    r^^],  (5.4.8) 

-(Ni    -    l)-l|l    -    ^T(J^^k^k)      ^i} 

xEa   U(aQ  +  S  +  USSR)/(nm+gQ-2a-p-2)-r5luQ}|Y^g^,s1. 

(5.4.9) 
\2  4      2 

/  4  \^  4      o 

since    fi    <    1    and    (    E  ^  j  j  1       <    4^tj^. 


Consequently , 


m 


'.S     ^HB    -    ^iSB         ^    4  2-1     E  t] 
1=1  j=l  1=1 


(5.4.10) 


As  in  Lemma  4.4.2,  we  can  show  "that  V 


a.  s  .       2 


m 


Uq    under 


-1  v^  4_2      a.s 


Tpj .       Using   this,    we    can    show    that    m        23  ^i  i       ^  '    0.       Also 

i  =  l 

m   13  tf •  is  uniformly  integrable.   So  we  have 
i  =  l 


lim  m-1  2  E*ft?.)  =  0. 


(5.4.11) 


Now  since 


187 


■Hi)  = 


(N.  -  l)-2n2u2v^Q 


..\2 


(^i(s)-^T^)  -(^i(s)-4boy 


it  follows  from  a  calculation  in  the  proof  of  Theorem  5.3.1 

that 

(5.4.12) 


lim  m~^  T   E*ft^.)  =  0, 
1— 'oo    -tf-i   V  -^1/ 


m 


Since  N-  >  2  for  all  i,  we  have 


m~^  £  t§.  <  2.25|(aQ  +  S  +  SSRWm) 


i  =  l 


_i-|2 
-r  (nm  + gQ  -  2a  -  p  -  2)  -  rQ  I  . 


(5.4.13) 


Recall  that  Wji,   ^  *  u^ .   Also  it  can  be  shown  that  under 
Tq,  S/(m(n  -  1))  ^^-    rQ^    and  SSR/(m  -  p)  ^^-    Vq^Uq^  ,    by 
using  Markov's  inequality  and  Bore  1 -Cante 1 1 i  lemma  as  in 
Section  4.3.   From  these,  it  follows  that 


[(ao  +  S  +  SSRWm)/(nm  +  go-2a-p-2)  -  Vq^'J 


a.  s 


0, 


Also  it  can  be  shown  uniformly  integrable.   Hence 


E*[(aQ  +  S  +  SSRWm)/(nm  +  gQ  -  2a  -  p  -  2) 
m  — ►  oo .   Therefore,  from  (5.4.13) 


■5T 


0    as 


m 


lim  m-^  V  E*('t?.)    =    0, 


(5.4.14) 


Ti     '^  T  1  2 

Note    that    since    0    <    xM    ^  ~k~k  I  ~i    -    ^'    ^^    have    t^^    < 

\k=l  /  2 

|(Wm(ao    +    S)    +    SSRVm)/(nm    +    gQ    -  2a    -    p    -    2)     -    Vq^UqI    . 


188 


We  can  also  show  as  before 

[(Wm(ao  +  S)  +SSRVm)/(nm  +  go-2a-p-2)  -  r5luo]  ^'-^-    0 

and  uniformly  integrable.   From  this,  we  have 

limm-lg  E*(tJ.)  =  0.  (5.4.15) 

Combining  (5.4.10)  -  (5.4.15)  we  get 
- 1  "*    f-  ~i2 


and  the  theorem  follows. 


CHAPTER  SIX 
SUMMARY  AND  FUTURE  RESEARCH 

6 . 1   Summary 

In  "this  dissertation,  a  unified  model-based  HB 
prediction  theory  is  developed  for  small  area  estimation  as 
well  as  for  animal  breeding  and  comparative  experiments. 
On  the  basis  of  the  analysis  of  two  data  sets  in  Chapter 
Two.  it  is  apparent  that  the  HB  analysis  is  a  viable 
alternative  to  the  existing  frequentist  methods  of 
inference.   Especially,  for  complex  models,  often  we  do  not 
have  closed  form  expressions,  or  suitable  approximations, 
of  the  MSEs  of  the  EBLUPs  or  the  EB  predictors.   On  the 
other  hand,  the  Bayesian  procedures  provided  in  this 
dissertation  can  be  used  routinely  given  the  present  state 
of  computing  facilities. 

In  the  special  case  of  known  ratios  of  variance 
components,  the  HB  predictor  was  shown  to  be  BLUP  for  a 
vector  of  linear  functions  of  the  finite  population 
observation  vector  or  the  vector  of  effects,  without  any 
distributional  assumption.   Moreover,  for  a  suitable  class 
of  elliptically  symmetric  distributions,  it  was  shown  (a) 
to  be  the  BUP ,  (b)  to  universally  or  stochastically 
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dominate  all  linear  unbiased  predictors,  and  (c)  to  be  the 
best  equivariant  predictor  under  suitable  groups  of 
transformations.   So  if  one  has  a  fairly  good  idea  of  the 
approximate  values  of  the  variance  components  based  on  past 
data,  one  may  incorporate  that  information  in  the  prior  and 
expect  that  the  proposed  HB  predictor  in  the  general  case 
will  be  approximately  optimal.   This  was  supported  by  the 
asymptotic  optimal ity  property  we  established  for  a  few 
specific  models.   Also,  we  proved  the  asymptotic  optimal ity 
of  an  HB  predictor  of  the  finite  population  variance  vector 
in  an  important  special  case. 

6 . 2   Future  Research 
We  have  proposed  an  HB  model  which  is  applicable  when 
we  have  a  single  characteristic  for  each  unit  in  the 
population.   In  survey  sampling  and  animal  breeding 
experiments,  it  is  quite  common  to  have  more  than  one 
characteristic  for  each  unit  and  characteristics  are 
correlated  within  a  unit.   A  useful  research  will  be  to 
provide  a  suitable  multivariate  extension  of  the  proposed 
model  to  include  this  type  of  problems.   Another  important 
model  which  is  not  considered  is  the  longitudinal  data 
problem.   Here  for  each  individual  unit  we  measure  one  or 
more  characteristics  over  time.   An  HB  model,  under  such 
circumstances,  involves  generalization  of  the  present  model 
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to  take  into  account  both  the  within  unit  and  between  unit 
variation . 

The  present  dissertation  did  not  address  the 
robustness  issue  except  for  the  study  oi^  universal  and 
stochastic  domination  which  in  some  sense  provides 
robustness  for  the  bowl-shaped  loss  inunctions.   One 
important  issue  is  the  prior  robustness.   For  the  vector  of 
fixed  effects,  we  have  used  uniform  prior  which  can  be 
viewed  as  a  diffused  multivariate  normal  prior.   The 
performance  of  the  proposed  HB  predictor  for  other  priors 
whose  tails  are  heavier  than  the  normal  (e.g.,  multivariate 
t)  is  a  topic  for  future  study. 

In  the  preceding  two  chapters,  we  have  studied  the 
asymptotic  optimal ity  of  the  HB  predictors  for  special 
models  with  two  variance  components  when  both  variance 
components  are  unknown.   It  might  be  worth  exploring 
whether  the  techniques  used  there  can  be  generalized  to 
problems  with  more  than  two  variance  components. 

Finally,  in  this  dissertation,  we  assume  no 
nonsampling  error,  measurement  error,  bias  or  nonresponse 
so  that  once  a  sample  is  drawn,  the  value  of  the 
characteristic  is  known  for  sure.   So  in  predicting  finite 
population  means  or  variances,  we  did  not  change  the 
SEunpled  vector  which  corresponds  to  the  seen  part  of  the 
characteristic  vector.   Estimation  of  the  finite  population 


means  or  variances  in  presence  oi^  response  bias  or 
measurement  error  can  also  be  explored. 


192 


APPENDICES 


APPENDIX   A 
PROOF  OF  THEOREM  2.3.1 


Under  the  assumptions  of  the  Theorem  2.3.1,  the  joint 
pdf  of  Y ,  B,  R  and  A  is  given  by 


f(y,  b,  r,  A) 

|sfW[-^r(y  -  Xb)TE-l(y  -  Xb)] 


oc  r 


In    1 


X  exp 


ISi-l. 


=  E  ^exp 


(-laor)r'*0-',xp(-lr.Ea.A.).n(Air)^''   rt 

-ir{(y  -  Xb)TE-l(y  -  Xb)  +  ag  +  £'i-*i} 
"■  i  =  l 


X  r 


K^T-^iSo«i)-lt   |g.-l 
iSl'i 


(A.l) 


Now, 


,T^-1 


U  A  TvTv->-  1> 


(y  -  Xb)iE-^(y  -  Xb)  =  (b  -  b)^X*?-^X(b  -  b)  +  y '  Qy , 


(A. 2) 


,-1 


where  b  =  (x'^S^^x)  ^X^^E-^Y  and  Q  =  E"^  -  S-lx(x'^E-^x)  ^x'^E"^ 
From  (A.l)  and  (A. 2),  one  gets,  integrating  b  out,  the 
joint  pdf  of  Y ,  R  and  A  is  given  by 
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^(y»  r,  A)  ex  |5|  |X  E   X|   r 


X  exp 


-grl  a 


1  =  1  / 


i  =  l 


(A. 3) 


Now.  integrating  w.r.t.  r,  one  finds  the  joint  pdf  of  Y  and 
A  is  given  by 


f(y,  A)  (X  |E|  2|X^E~^X|  ^ 


X   a 


0  +  .E -i^i  +  y'gy^ 


■KNT+.So^i-P)  ,  Jg._i 


i5/i 


(A. 4) 


Similarly  starting  with  the  joint  pdf  of  Y    ,  B,  R  and  A 
one  gets  the  joint  pdf  of  Y     and  A  is  given  by 


(1) 


{yy\   a)  «  |E,,| 


v(l)T^-lv(l) 


ao  +  .E  a.A.  +  y^    Ky  ^  ^j 


i  =  l 


(A. 5) 


So  the  conditional  pdf  of  Y     given  A  =  A  and  Y^  ^  =  y^  ^ 
is  given  by 


Y  (2),,       (i)n        -f(y>  h) 
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oc 


{\w\^ii\yHk^^~h\/ 


x^'^Wi^^'^ 


(  t  (1)T       (l)\K"T+i?o^i-P) 


(A. 6) 


Now, 


^    -22.  li- 


^21-11-         ) 


gives 


X'^E'^X 


(1)T    _i     (1)     ^     /^(2)  ^      v-lv(l)^'r 


X^    '    E7|X'    '    +     X 


-11- 


(x^--^^  -  S,,Slix^^') 


^-1       /^C^)  ^       K-lx^^^^l 

^22. iV-  "    -21-11-         ) 


X^^>^Eilx^^>         -(X(^)-E2i?llx(^y 


(2)  ^-1^(1) 

A  -  4^21-11- 


^22.1 


|522.l| 


v(l)T^-ly(l) 


=■22 


(ihT 


,  .  (x^^^  -  ?2i?li^^^0 


X      X 


(x('>^£llx(^Y'(x'''  -  S2i5lJx^'^) 


^    1^22.11' 


(A.  7) 


by  using  Exercise  2.4  on  p.  32  of  Rao  (1973)  twice.   Again 
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using  "the  same  exercise,  we  have 


l?l   =   |?ll|i?22.l| 


(A. 8) 


Using  the  definition  of  G  from  (2.3.6)  and  (A. 7)  and  (A. 8) 
we  h  ave 


Ix'^E-^xl   = 


x<''^Slix(') 


51  -  (lSI/|Siil) 


I.e. 


x'^E'^xl 


x^'^^Slix^'^ 


)(|S|/|?ll|)   =   |G| 


(A. 9) 


Now    using    a    result    of    Sal  las    and    Harville    (1981)    we    have 


g    =   jljm^f?    +    eXX'^)         =   ^Um^B-^(0     (say) 


(A. 10) 


We  partition  B(€)  into 


B(0  = 


§21(0   §22(0 


(A. 11) 


where  Bjj(e)  =  S-j  +  eX^^\^-^''  (i,  j  =  1,  2).  Next  using 
standard  formula  for  partitioned  matrices  (e.g.,  p.  46  of 
Searle,  1971) , 
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yTB-l(Oy  =  /'^''b^ICo/'^  +  (/^^  -  B2i(0BlJ(e)/^y 
^  §22.l(y*^^^  -  52l(0BiJy^^^),  (A. 12) 

where  §22  i(^)  =  B22(f)  "  §21^*^^  ~  11  ^^^ -12^^)  *   Using  a 
formula  similar  to  (A. 10),  and  recalling  the  definition  of 
K  in  (2.3.4)  we  have 

=    K.  (A. 13) 

Next,  observe  that 

§2lC0BiJ(0  =  S2iBiJ(0  +  ex'^^^X^^^'^ 
Again,  by  Sallas  and  Harville  (1981), 

=  (X<'^''silx<^))-'x<'>'fsil.  (A. 15) 

From  (A. 13)  -  (A. 15)  and  (2.3.5),  it  follows  that 
cUS,e2l(0BiJ(c) 

=  E21K  +  x^'%^'^hllx^'Y^^'^Wi 

=  M.  (A. 16) 
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Now,  note  that 

B2i(<)Bj}(0Bi2(0 

(A. 17) 

Now  since  X     is  of  full  column  rank,  there  exists 
some  matrix  F  s.t.  X     =  FX    .   Then 

(1)^(2)T    ^^   //     ^  .v(l)v(l)T\-lv(l)lv(2)T 


=  eFX^^^X^"^^  -  F?iJ(?,i  +  eX^'^X^^^^) 


X^  'fSX 


(A. 18) 


From  (A. 17)  and  (A. 18),  it  follows  that 


§22.l(0  =  B22(')  -  B2i(0BiJ(€)B^2(0 


-  ?22  ~  ?21-ll(^)-12 
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^  ^  (A. 19) 

Now    using    (A. 13)    and    (A. 15)    we    have    from    (A. 19)    that 
1  im  Boo    1  (f) 


=    E 


-lv(lVv(l)Tv-lv(l)r^v(2)T 


^22 


1    u;/    u;  iv-iyUA-  Y 
-    l'21*^='12    "    ^21^1 1<>        [^  ^11'^        j      '^ 


(2).    (1)T^_.     (l)N-l     (1)T    _1 


LV 


-    ^22 


-    ^        (^  -11-         j      ^^    '     5iii^l2 

r     r-is       +  s     K-ix^^Vx^^^'^E-ix*^^^r^ 

^21-11-12    +    ^21-11-         i-  -11-         J 

.  x(^'-Slj5,3  -  5,,Sllx(^>(x<>)-5-x(^))-\f^)- 
-  x<^\x('>'^sjix('>)-'x(i>'^Ejj5i2 

=  .,,,,.  (x(-)-5..Sl5x(^))(x(')-rjJx<>')- 

X  (X     -  E<2-|E7|X    1  ,  after  some  rearrangements 
=  G      using  (2.3.6).  (A. 20) 

Note  that  G  is  p.d.   Now  from  (A. 10),  (A, 12),  (A. 13)  and 
(A. 20),  one  gets 
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(A. 21) 

Now  from  (A. 6),  (A. 9)  and  (A. 21)  it  follows  that 
(2),.   ..(Ih 


^(/'^|A,  /^O 


Gi  'U  -  i-.h  -  /^>\/^>)-^^T-"^^ 


oc  ■■•■  2 


/       t  (i)T   (l)M-KNT+.^ogi-P) 

^      '-^  '^  (A. 22) 

Writing  fn^  +  E  Si  "  p)   (  ^q  +  .E^jA.  +  y^    Ky  ^  '  JG  =  Q, 
it  follows  from  (A. 22)  that 

(2),,   ,.(1> 


=(/  'li.  y'  ') 


oc 


sr-T  +  isi-  P  +  (y^'^  -  5!y(i)) 


(^)  -  MY^^Yfi-l 


i=0 


X  (y^^^  -  My^'^)        ^-"  .     (A. 23) 


From  (A. 23)  and  (2.3.2),  it  follows  that  the 
conditional  distribution  of  Y     given  Y     =  y     and 
A  =  A  is  (Nrp  -  nrp) -var  iate  t-distribut  ion  with  d.f . 
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t  (1) 

rirp    +      5Z  g  •     -    P>     location    parajneter    My  and    scale 

i=0  ^ 

/       t        \~^f  t  (1)T   (1)\ 

parameter  fi  =  I  n.p  +  ESi  -  P]   I  ^0  "*"  .^  ^i^i  '^    Y  -^    J- 

Also,  since  f  f  A  |  y  ^  ^J  -    ^^ (TY^^    °^    ^V  '  ^^  '     '^^    follows 

^{y~     ) 

from  (A. 5)  that 


f 


(1) 


(^|y'  0  «  |?iil 


^^'^Wi^^'^ 


i  =  l 


X   a 


0.  t^,^,.y'''\y'<'"''*-^=o''''" 

i  =  l  / 


APPENDIX   B 

AN  INDEPENDENCE  RESULT  IN  A  FAMILY  OF 

ELLIPTICALLY  SYMMETRIC  DISTRIBUTIONS 

Lemma  B .  1  .   Consider  a  ^^amily  of  distributions  9  = 

|8_p(X/3,  <T-^E)  :  0€bF  ,    cr  >  o|  where  g_f(-,  •)  is  as  defined  by 

(3.3.18),  X  (nxp)  is  a  known  matrix  of  rank  p  and  E  (nxn) 

is  a  known  p.d.  matrix.   Then  for  a  random  variable  Y  (nxl) 

9  /s-,(Y)\ 

with  distribution  6^(X/3,  a    E)  ,    s(Y)  =  I  g  cy)  |  ^"^  5(Y)  are 

independently  distributed  where  s^(Y)  =  f y'^QY V ,  S2(Y)  = 
(x'^E-^XJ'^x'^E-^Y,  z(Y)  =  gY/s-^(Y)  and  Q  =  E"^  - 
E"^x(x'^E-^x)~  X'^E"^. 


Proof  of  Lemma  B.l.   To  prove  this  lemma,  we  need  to  verify 
the  conditions  of  Proposition  7.19  of  Eaton  (1983).   First 
we  will  note  that  s(Y)  is  sufficient  for  the  family  of 
distributions  9.   Note  that  Y  has  pdf 

f(y;  0,    <T)     oc  f((y  -  X0)^^-^(y    -    X^)/<72)  (B.l) 


and 


(y  -  X^)TE-l(y  -  X^)/<t2 

=  y'^E-ly/tr^  -  l&^y^T.-^yU'^    +  ^'^x'^E'^X^ /(t^  . 


(B.2) 


203 


204 


Recall  from  (3.3.16)  -  (3.3.18)  that  f  is  assumed  to  be 

/yTe-1y\ 

known.   From  (B.l)  and  (B.2),  it  follows  that  (  ~t^~  -,~  \    is 

V^  ?~  1/ 

sufficient  for  "iP .   But 

y'^Qy  =  y'^e"^y  -  (x'^s"^y)  (x'^e"^x)~  (x'^e"^y) 


/yTs-1y\ 

gives  that  I  ~rp~     ^~   I  and  s(Y)  are  one-to-one  functions  of 
VX^S"^Yy      ~  ~ 

each  other.   So  s(Y)  is  a  sufficient  statistic  for  the 
family  '3'. 

Let  9G  =  R"  -  -^(X)  where  A>(X)    =  |x/? :  /?€R^}  is  the 
vector  space  generated  by  the  columns  of  the  matrix  X. 
Also,  let  A    =    |b6^":  BCSSJ  where  "5"  is  the  product  (t- 
algebra  in  R"^  .   We  have  chosen  95  in  this  way  so  that  for 
y€95,  y^Qy  >  0  and  z(y)  is  defined.   Note  that  Pg  [Y€9G]  =  1 

Also  denote  the  Cartesian  product  of  R  ,  the  positive 
part  of  the  real  line,  and  R  by  S,  i.e.,if  =  R  xR,  and 
the  (T-algebra  of  the  subsets  of  i'  by  C-,^  =  jce^'^''"  :  CC:f|. 

Now,  consider  the  group  Q  =  (G,  o)  where  the  set  G  is 

given  by  G  =  <s|a  =  \\b)    ^  ^   ^  -^(^)f  and  o  is  a  binary 

operation  which  we  define  below.   For  g.  =  [vj^    )  6  G, 

/^1<^2  \ 

i  =  1,  2,  we  define  Q;^oQr2  =  Iv/'^   +  H  /?  w"   With  this 


definition,  Q  is  a  group  with  (qJ  as  the  identity  element 

and  (a)"   =  (     -,         )  G  G  as  the  inverse  element  of  a    E    G. 
\-d-Hl3 
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For  y  e  SG,  G  acts  on  9G  from  the  left  by  a(y)  =  dy  +  X^  G 
9E.   Note  that  the  result  of  G  acting  on  9G  from  the  left  is 
equivalent  to  the  group  of  transformations 

^2  =  {?/?,d'  ^^^^^    ^>^=  ?/?,d<^y)  =  dy  +  X^} 

given  by  (3.5.15). 

Consider  another  group  Q  =  (G ,  *)  where  G  =  |t 1 7  = 
(/i)    €  R   X  R  I  and  *  is  a  binary  operation  which  we  define 

^°^  2i  =  f^l)  e  ^'  i  =  1'  2  by  21*22  =  (^dj^2  +  ilj'        ^^    '^ 
easy  to  verify  that  Q  is  a  group  with  (q)    G  G  as  th« 


le 

identity  element  and  (t)~   =  I    _i  1  €  G  as  the  inverse 

element  of  7  €  G. 

Now,  we  will  show  that  G  is  a  homomorphic  image  of  G. 
Define  the  function  7/  from  G  onto  G  by 

where  a  =  (v/j)  G  G.   It  is  easy  to  verify  that  for  g,  a^ 
and  02  e  G,  7/(0^002)  =  n(oii)    *    viQi^)    and  r}{q~    )  =  [p{Q)j 
Then  the  function  r;  is  a  homomorphism  from  G  onto  G  and  G 
is  a  homomorphic  image  of  G. 

For  s  =  (si)  €  :f,  G  acts  on  f  from  the  left  by 

2(§)  =  ^2  ^   i)^'- 
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Sometimes  we  will  say  G  a  group  without  explicitly 
referring  to  the  group  operation. 

Note  that  G  acts  measurably  on  (J,  C^)    and  G  is  a 
locally  compact  and  u-compact  topological  group  (with  the 
usual  topology  in  R    )  with  o--algebra  C^ .   Also,  the 

mapping  (7,  §)  -  t(s)  =  (dJ2+-^)  ^''°"'    i^"^^ '     '^l^'^l)  ^°  <^^'  ^1^ 
is  measurable.   Here  C-|XC^  is  a  product  <T-algebra  of  C^ 
with  itself.   So  the  conditions  of  Proposition  7.17  of 
Eaton  (1983)  are  satisfied  for  G  and  the  measurable  space 
(y,  C^).   Also  note  that  G  acts  transitively  on  'S .       Since 

s(g(Y))  =  s(dY  +  x^)  =  (tll^^)   +  p)  =   2(§(Y)). 

so  s  on  9G  to  :f  is  measurable  and  equivariant. 
It  is  easy  to  check  that 

q(dY  +  X^)  =  dQY  (B.3) 

and 

s^(dY  +  X^)  =  ds^(Y)  .  (B.4) 

The  definition  of  z(Y)  and  (B.3)  and  (B.4)  imply  that 
z;(dY  +  X/3)  =  z(Y).   Hence  z(Y)  is  a  G-invariant  function. 
Moreover,  if  Y  ~  S^(0,  S) »  then  the  distributions  of 
the  family  of  random  variables  •|a(Y)  =  dY  +  X/? :  a  €  G>  is 
given  by  the  family  of  distributions  ^.   So  all  the 
conditions  of  Proposition  7.19  of  Eaton  (1983)  are 
verified.   The  independence  of  s(Y)  and  z(Y)  follows. 
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